att 


nte 


McGraw-Hill 
Series in 


Psychology 


rnit 


же 


"S d, 


Ee ur 


ЕСА - 
= os 


+” 


e 

E. 
b. 
* 

» 
F: 


Statistical Analysis in Psychology and Education 


3M 
FEE 


STATISTICAL ANALYSIS IN PSYCHOLOGY AND EDUCATION 


Copyright © 1959 by the McGraw-Hill Book Company, Inc. Printed 
in the United States of America. All rights reserved. This book, or 
parts thereof, may not be reproduced in any form without permission 


of the publishers. Library of Congress Catalog Card Number 59-10708 


THE MAPLE PRESS COMPANY, YORK, PA, 


BureauEdni.. search 
DAVID HA GE 

Dated... JA . 3. 60 
Aces. No 


PREFACE 


The object of this book is to introduce students and research workers in 
psychology and education to the concepts and applications of statistics. 
Emphasis is placed on the analysis and interpretation of data resulting from 
the conduct of experiments. Students and investigators in experimental 
medicine, psychiatry, sociology, and other disciplines may also find the book 
useful. . * 

This book may be used аз a text for either a one-semester or a full course 
in statistics. When used as a text for a one-semester course the instructor 
may exercise a choice in the selection of material. The selection will usually 
include Chapters 1 to 9, most of Chapter 10, and possibly a few sections 
in some of the remaining chapters. Different instructors hold somewhat 
divergent views with regard to the content of introductory courses in sta- 
tistics. ‘This book has been designed to permit the instructor some freedom 
of choice in the selection of course content. 

I have attempted not only to introduce the student to the practical 
technology of statistics but also to explain in a nonmathematical and fre- 
quently intuitive way the nature of statistical ideas. This is not always 
easy. Obviously, the extent to which an understanding of statistics can be 
communicated without some mathematical knowledge is limited. Skill 
in high school or freshman algebra will prove most helpful to the student. 

The writing of a book of this type demands numerous compromises between 
a tidy logical arrangement of material, sound pedagogy, and common usage, 
which are not always compatible. The desire for completeness has led to 
the inclusion of an occasional section which perhaps should not be included 
in an introductory text. The instructor can readily identify these sections 
and omit them if he chooses. 

Differences of viewpoint exist among statisticians on a variety of technical 
points. In writing a book of this type an author must on occasion proceed 
in full awareness that he is open to criticism regardless of the viewpoint he 
selects. An example here resides in the choice between N and N — 1 in the 
initial definition of the standard deviation and variance. Advantages and 
disadvantages attach to both procedures. Adopting the more traditional 
procedure, I have chosen N, a choice which subsequently requires a discussion 

v 


vi Preface 


of biased and unbiased variance estimates. This book contains what I 
believe to be an up-to-date discussion of the analysis of variance for two-way 
Classification. Some readers may feel that a discussion of higher-order 
classifications should have been included. My view is that а thorough 
understanding of the two-way classification case in all its aspects is a neces- 
sary preliminary to further work in the analysis of variance. When the 
two-way classification case is fully understood, extension to higher-order 
classifications is relatively easy. 

The usefulness of this book is enhanced by the kindness of authors and 
publishers who have permitted the adaptation and reproduction of tables 
and other materials published originally by them. I should like to express 
my gratitude to Francis G. Cornell, Allen L. Edwards, R. W. B. Jackson, 
Palmer O. Johnson, M. G. Kendall, John F. Kenny, Don Lewis, Quinn 
McNemar, Edwin G. Olds, George W. Snedecor, Herbert Sorenson, James Е. 
Wert, and Frank Wilcoxon and to the Scottish Council for Research in 
Education, the University of London Press, Charles Griffin & Co., Prentice- 
Hall, Inc., John Wiley & Sons, Inc., D. Van Nostrand Company, Inc., 
Rinehart & Company, Inc., Iowa State College Press, and the Annals of 
Mathematical Statistics. І am indebted to Professor Sir Ronald A. Fisher, 
Cambridge, to Dr. Frank Yates, Rothamsted, and to Messrs. Oliver & 
Boyd, Ltd., Edinburgh, for permission to reprint Tables III, IV, and VI of 
their book Statistical Tables for Biological, Agricultural, and Medical Research. 

I should like to express here my indebtedness to the late Sir Godfrey Н. 
Thomson and to W. G. Emmett and D. N. Lawley, all of the University of 
Edinburgh. These three are responsible for my persisting interest in the 
applications of statistical method to psychological problems. In particular, 
I should like to express my gratitude to Lady Thomson for permission to 
reproduce certain tables from Sir Godfrey’s work. 

This book has benefited greatly by many constructive criticisms and 
suggestions on the manuscript from Julian C. Stanley of the University of 
Wisconsin. I need hardly add that he is in no way responsible for errors 
and omissions. I am also grateful to Samuel Fillenbaum of the University 
of North Carolina, who read the manuscript and made many useful comments. 

In conclusion, I must express my great indebtedness to Miss Beverley 
Houghton for her patience, skill, and painstaking care in the difficult task 
of typing the manuscript. 


George A, Ferguson 


CONTENTS 


Preface 


1. 


15. 
16. 
17. 
18. 
19. 


Basic Ideas in Statistics . 


Frequency Distributions and Their Graphic Représentation 


Averages . 
Measures of Variation, E nib оной 


. Probability and the Binomial Distribution . 


The Normal Curve . 


. Correlation 


Prediction in Relation. to Correlation 


. Essential Ideas of Sampling 

. Tests of Significance 

» Chi Square : 

. Rank Correlation Methods 

. Other Varieties of Correlation . 

4. Transformations: Their Nature and Рикрове 


Analysis of Variance: One-way Classification 
Analysis of Variance: Two-way Classification 
Selected Nonparametric Tests . 

Errors of Measurement 

Partial and Multiple Сагон 


Appendix . 


Glossary of Symbols 


References 


Index 


vii 


Nee 1) селення s 
ЗЕТ E [ 


TT hoodies oso asma estados ada 
ро" iai Pome em hits comete n 
ter tes a A Mos Race edm 7 [ 
» bes Eo 
кї 
fios pe! v 


e “гут i 
: y [| m К i Mg hasip р = LUE "T 763 эл hpt T 1 
„ hos r Nhu, ng open e Pass үзгә d у) ТЕ! * 


orl tong, Rome К نة‎ 


OS POR рт 
4 к> К: Uva e oa td > ALB 


А ы) РА i- ра. ; x есь у еа 4 
M we д ex IP] SAM зу даса 2 € P: tn 


"m LL c e 


"c 
Р 
ES 
e 
a 


CHAPTER 1 


BASIC IDEAS IN STATISTICS 


1.1. Introduction 


This book is concerned with the elementary statistical treatment of 
experimental data in psychology, education, and related disciplines. The 
data resulting from any experiment are usually a collection of observations 
or measurements, The conclusions to be drawn from the experiment cannot 
be reliably ascertained by simple direct inspection of the data. Classifica- 
tion, summary description, and rules of evidence for the drawing of valid 
inference are required. Statistics provides the methodology whereby this 
can be done. , 

Implicit in any experiment is the presumption that it is possible to argue 
validly from the particular to the general and that new knowledge can be 
obtained by the process of inductive inference. The statistician does not 
assume that such arguments can be made with certainty. On the contrary, 
he assumes that some degree of uncertainty must attach to all such argu- 
ments; that some of the inferences drawn from the data of experiments are 
wrong. He further assumes that the uncertainty itself is amenable to precise 
and rigorous treatment, that it is possible to make rigorous statements about 
the uncertainty which attaches to any particular inference. Thus in the 
uncertain milieu of experimentation he applies a rigorous method. 

A knowledge of statistics is an essential part of the training of all students 
in psychology. There are many reasons for this. First, an understanding 
of the modern literature of psychology requires a knowledge of statistical 
method and modes of thought. A high proportion of current books and 
journal articles either report experimental findings in statistical form or 
present theories or arguments involving statistical concepts. These concepts 
play an increasing role in our thinking about psychological problems, quite 
apart from the treatment of data. The student need only consider, for 
example, the role of statistical concepts in current lines of theorizing in the 
field of learning to grasp the force of this argument. Second, training in 
psychology at an advanced level requires that the student himself design and 
conduct experiments. The design of an experiment is inseparable from the 
statistical treatment of the results. Experiments must be designed to enable 


the treatment of results in such a way as to permit an adequate test of the 
1 


2 Slalislical Analysis in Psychology and Educalion [CHAP. 1 


hypothesis that led to the conduct of the experiment in the first place. If 
the design of an experiment is faulty, no amount of statistical manipulation 
can lead to the drawing of valid inferences. Experimental design and 
statistical procedures are two sides of the same coin. Thus not only must 
the advanced student conduct experiments and interpret results, he must 
plan his experiments in such a way that the interpretation of results can 
conform to known rules of scientific evidence. Third, training in statistics 
is training in scientific method. Statistical inference is scientific inference, 
which in turn is inductive inference, the making of general statements from 
the study of particular cases. These terms are for all practical purposes, and 
at a certain level of generality, synonymous. Statistics attempts to make 
induction rigorous. Induction is regarded by some scholars as the only way 
in which new knowledge comes into the world. While this statement is 
debatable, the role in modern society of scientific discovery through induction 
is obviously of the greatest importance. For this reason no serious student 
of psychology, or any other discipline, can afford not to know something of 
the rudiments of the scientific approach to problems. Statistical procedures 
and ideas play an important role in this approach. 


1.2. The Broad Role of Quantification in Psychology 


While this book is largely concerned with elementary statistical procedures 
and ideas, some mention may be made of the broad role of quantitative 
method in psychology. 

The attempt to quantify has a long and distinguished history in experi- 
mental psychology, which indeed may be regarded as synonymous with the 
history of that science itself. Since the experimental work in psychophysics 
of E. H. Weber and Gustav Fechner in the nineteenth century, determined 
attempts have been made to develop psychology as an experimental science. 
The early psychophysicists were concerned with the relationship between the 
"mind" and the “body” and developed certain mathematical functions 
which they held to be descriptive of that relationship. While much of their 
thinking on the mind-body problem has been discarded, their methods and 
techniques with development and elaboration are still used. Shorn of its 
philosophical and theoretical encumbrances, the work of the early psycho- 
physicists was reduced in effect to the study of the relationship between 
measurements, obtained in two different ways, of what were presumed to be 
the same property. Thus, for example, they studied the relationship between 
weight, length, and temperature, defined by the responses of human subjects 
as instruments, and weight, length, and temperature, defined by other measur- 
ing instruments, scales, foot rules, and thermometers. A psychophysical 
law, so called, is a statement of the relationship between measurements 
obtained by these two methods. Modern psychophysics is concerned to some 


Sec. 1.2] Basic Ideas in Statistics 3 


considerable extent with the scaling of the responses of the human subject as 
instrument and with the use of the human subject as instrument in dealing 
with a wide variety of practical problems. It may perhaps be referred to as 
human instrumentation. 

The early psychophysicists invented certain experimental methods and 
developed statistical procedures for handling the data obtained by these 
methods. It is of interest to note that one method, the constant process, 
developed by G. E. Müller and F. M. Urban, has recently, with modification, 
found application in biological-assay work in assessing the potency of 
hormones, toxicants, and drugs of all types. It is currently known in biology 
as the method of probits (Finney 1944, 1947). 

Statistical methods have found extensive application in the psychological 
testing field and in the study of human ability. Since the time of Binet, who 
developed the first extensively used test of intelligence and whose thinking 
was influenced by the early psychophysicists, a comprehensive body of 
theory and technique has been developed which is primarily statistical in 
type. This body of theory and technique is concerned with the construction 
of instruments for measuring human ability, personality characteristics, 
attitudes, interests, and many other aspects of behavior; with the nature and 
magnitude of the errors involved in such measurement; with the logical 
conditions which such measuring instruments must satisfy; with the quantita- 
tive prediction of human behavior; and with other related topics. 

The use of psychological tests stimulated the development of the techniques 
of factor analysis, which are used to some extent in contemporary psychology. 
Problems arise which involve a study of the relationships between sets of 
variables, sometimes as many as 50 or 60 and perhaps more. Factor analysis 
attempts to provide a simplified description of these relationships, which 
facilitates an interpretation and comprehension of the information in the 
data. Factor analysis has found a number of uses in branches of science 
other than psychology, including meteorology and agriculture. Some of the 
problems in factor analysis have not as yet been fully resolved. 

Within recent years frequent use has been made of statistical concepts in 
the construction of models designed to provide some explanation and under- 
standing of observable phenomena. Such models are used in the field of 
learning. Further, many biological scientists are currently concerned with 
the construction of models which may possibly bear some correspondence to 
the functioning of certain aspects of the central nervous system. While 
these attempts may be premature and their success cannot at this time be 
evaluated, it is possible that in future the models which will prove helpful in 
understanding the functioning of the human brain will either implicitly or 
explicitly involve statistical concepts. In a system comprised of a complex 
network of nerve fibers, the transmission of impulses can be conceived in 
probabilistic terms. 


4 Statistical Analysis in Psychology and Education [Снлр.1 


While the avenues of quantification mentioned above do not fall within 
the context of this book, their study demands a knowledge of statistical 
method and a comprehension of the basic ideas of statistics as a starting 
point. It would seem that as psychology develops, increasing emphasis will 
be placed on quantitative procedure and an increasing degree of statistical 
sophistication will be required of the student. 


1.3. Statistics as the Study of Populations 


Statistics is a branch of scientific methodology. It deals with the collec- 
tion, classification, description, and interpretation of data obtained by the 
conduct of surveys and experiments. Its essential purpose is to describe and 
draw inferences about the numerical properties of populations. The terms 
population and numerical property require clarification. 

In everyday language the term population is used to refer to groups or 
aggregates of people. We speak, for example, of the population of the United 
States, or of the state of Texas, or of the city of New York, meaning by this 
all the people who occupy defined geographical regions at specified times. 
This, however, is a particular usage of the term population. The statistician 
employs the term in a more general sense to refer not only to defined groups 
or aggregates of people, but also to defined groups or aggregates of animals, 
objects, materials, measurements, or “things” or “happenings” of any kind. 
Thus the statistician may define, for his particular purposes, populations of 
laboratory animals, trees, nerve fibers, liquids, soil, manufactured articles, 
automobile accidents, microorganisms, birds’ eggs, insects, or fishes in the sea. 
On occasion he may deal with a population of measurements. By this is 
meant an indefinitely large aggregate of measurements which, hypothetically, 
might be obtained under specified experimental conditions. To illustrate, a 
series of measurements might be made of the length of a desk. Some or all of 
these measurements may differ one from another because of the presence of 
errors of measurement. This series of measurements may be regarded as part 
of an indefinitely large aggregate or population of measurements which might, 
hypothetically, be obtained by measuring the length of the desk over and 
over again an indefinitely large number of times. 

The general concept implicit in all these particular uses of the word popula- 
tion is that of group or aggregation. The statistician’s concern is with 
properties which are descriptive of the group or aggregation itself rather 
than with properties of particular members. "Thus measurements may be 
made of the height and weight of a group of individuals. These measure- 
ments may be added together and divided by the number of cases to obtain 
the mean height and weight. These means describe a property of the group 
as a whole and are not descriptive of particular individuals, To illustrate 
further, a child may have an IQ of 90 and belong to a high socioeconomic 


Sec. 1.3] Basic Ideas in Slalislics 5 


group. Another child may have an IQ of 120 and belong to a low socio- 
economic group. These facts as such about individual children do not 
directly concern the statistician. If, however, questions are raised about the 
proportion of children in a particular population or subpopulation with IQ’s 
above or below a specified value, or if more general questions are raised about 
the relationship between intelligence and socioeconomic level, then these are 
questions of a statistical nature, and the statistician has techniques which 
assist their exploration. 

The distinction is sometimes made between finite and infinite populations. 
The children attending school in the city of Chicago, the inmates of peni- 
tentiaries in Ontario, the cards in a deck are examples of finite populations. 
The members of such a population can presumably be counted, and a finite 
number obtained. ‘The possible rolls of a die and the possible observations 
in many scientific experiments are examples of infinite or indefinitely large 
populations. The number of rolls of a die or the number of scientific observa- 
tions may, at least theoretically, be increased without any finite limit. In 
many situations the populations which the statistician proposes to describe 
are finite, but so large that for all practical purposes they may be regarded as 
infinite. ‘The 175 million or so people living in the United States constitute a 
large but finite population. This population is so large that for many types 
of statistical inference it may be assumed to be infinite. This would not 
apply to the cards in a deck, which may be thought of as a small finite popula- 
tion of 52 members. 

Most populations are comprised of naturally distinguishable members, as 
is, of course, the case with people, animals, measurements, or the rolls of a die. 
Some populations are not so comprised, as is the case with liquids, soils, woven 
fabrics, or, for that matter, human behavior. How is it possible to apply 
the concept of group or aggregation to populations of this latter type? This 
may be done by defining the population member arbitrarily as a liter, a cubic 
centimeter, a square yard, or some such unit. The whole population may be 
thought to be composed of an aggregate of such members. Likewise, in the 
study of human behavior, the psychologist frequently concerns himself with 
arbitrarily defined bits of behavior, although behavior as such may perhaps 
be regarded as a continuous flow or sequence. 

Statistics is concerned with the numerical properties of populations, that 
is, with properties to which numerals can in some manner be assigned. The 
logical implications of the term numerical property are complex and need not 
be elaborated here. To illustrate briefly, however, in any population of 
mental-hospital patients some may be classed as psychoneurotic, others as 
schizophrenic psychotic, others as psychotic with organic brain disease, and 
so on. Further, some patients may come from broken homes, while others 
may have a normal healthy home background. Some may have a history of 
mental disease in the family, and others may not. We may be said to apply 


6 Slalislical Analysis in Psychology and Education — [Снлр.1 


a statistical method when we concern ourselves with how many patients in 
the population fall within these various classes, that is, how many are 
psychoneurotic, schizophrenic psychotic, and the like, and how many come 
from broken homes, how many do not, and soon. Further, the flicker fusion 
rates of some part or all of the population may be measured and attention 
directed to the numbers of patients who fall within specified ranges of flicker 
fusion rate, to mean rates for various classes of patients, and to related 
problems. The investigation of such problems as these may be said to 
involve a statistical method. In general, the statistician's concern is with 
those properties of populations which can be expressed in numerical form. 
In summary, statistics is a methodology for the exploration of, and the 
making of statements about, the properties of groups or aggregates called 
populations. These statements involve the use of numbers. These delimita- 
tions of the referent of the word statistics are adequate for the purposes of 
this book, although quite clearly further delimitations may usefully be made. 


1.4. Samples and Sampling 


Many populations are either large but finite or indefinitely large. Con- 
sequently, it becomes either impracticable or impossible for the investigator 
to produce statistics based on all members. If, for example, interest is in 
investigating the attitudes of adult Canadians toward immigrants, it would 
obviously be a prohibitively expensive and time-consuming task to measure 
the attitudes of all adult Canadians and produce statistics based on a study 
of the complete population. If a population is indefinitely large, it is of 
course impossible, ipso facto, to produce complete population statistics. 
Under circumstances such as these the investigator draws what is spoken of 
as a sample. A sample is any subgroup or subaggregate drawn by some 
appropriate method from a population, the method used in drawing the 
sample being important. Methods used in drawing samples will be discussed 
in later chapters of this book. Having drawn his sample, the investigator 
utilizes appropriate statistical methods to describe its properties. He then 
proceeds to make statements about the properties of the population from his 
knowledge of the properties of the sample; that is, he proceeds to generalize 
from the sample to the population. To return to the example above, an 
investigator might draw a sample of 1,000 adult Canadians, the term adult 
being assigned a precise meaning, measure their attitudes toward immigrants 
using an acceptable technique of measurement, and calculate the required 
statistics. Questions may then be raised about the attitudes of all adult 
Canadians from the information obtained from a study of the sample of 1,000. 

The fact that inferences can be made about the properties of populations 
from a knowledge of the properties of samples is basic in research thinking. 
Such statements are of course subject to error. The magnitude of the error 


Sec. 1.4] Basic Ideas in Slalislics 7 


involved in drawing such inferences can, however, in most cases be estimated 
by appropriate procedures. Where no estimate of error of any kind can be 
made, generalizations about populations from sample data are worthless. 

Information about properties of particular samples, quite apart from any 
generalizations about the population, is of little intrinsic interest in itself. 
Consider a case where the investigator's interest is in the relative effects of 
two types of psychotherapy when applied to patients suffering from a particu- 
lar mental disorder. He may select two samples of patients, apply one type 
of treatment to one sample and the other type of treatment to the other 
sample, and collect data on the relative rates of recovery of patients in the 
two samples. Clearly, in this case his interest is in finding out whether the 
one treatment is better or worse than the other when applied to the whole 
class of patients suffering from the mental disorder in question. He is 
interested in the sample data only in so far as these data enable him to draw 
inferences with some acceptable degree of assurance about this general 
question. His experimental procedures must be designed to enable the 
drawing of such inferences, otherwise the experiment serves no purpose. 
On occasion research reports are found where the investigator states that the 
experimental results obtained should not be generalized beyond the particular 
sample of individuals who participated in the study. The adoption of this 
view means that the investigator has missed the essential nature of experi- 
mentation. Unless the intention is to generalize from a. sample to a popula- 
tion, unless the procedures used are such as to enable such generalizations 
justifiably to be made, and unless some estimate of error can be obtained, the 
conduct of experiments is without point. 

Statistical procedures used in describing the properties of samples, or of 
populations where complete population data are available, are referred to by 
some writers as descriptive statistics. If we measure the 1Q of the complete 
population of students in a particular university and compute the mean 10), 
that mean is a descriptive statistic because it describes a characteristic of the 
complete population. If, on the other hand, we measure the IQ of a sample 
of 100 students and compute the mean IQ for the sample, that mean is also 
a descriptive statistic. 

Statistical procedures used in the drawing of inferences about the proper- 
ties of populations from sample data are frequently referred to as sampling 
statistics, И, for example, we wish to make a statement about the mean IQ 
in the complete population of students in a particular university from a 
knowledge of the mean computed on the sample of 100 and estimate the error 
involved in this statement, we use procedures from sampling statistics. The 
application of these procedures provides information about the accuracy of 
the sample mean as an estimate of the population mean; that is, it indicates 
the degree of assurance we may place in the inferences we draw from the 


sample to the population. 


8 Statistical Analysis in Psychology and Education  [Снлр.1 


While the distinction between descriptive and sampling statistics is a 
useful one, it may be emphasized that the ultimate object of statistical 
method is the making of statements about populations. A mean calculated 
on a sample provides information about the population from which the sample 
is drawn, although in any particular instance the information may be very 
inaccurate. The ultimate intent is in all instances to find things out about 
populations. Most statistical methods, whether referred to as descriptive 
or sampling methods, are means to this end. 

In this section no discussion is advanced on methods of drawing samples 
or the conditions which these methods must satisfy to allow the drawing of 
valid inferences from the sample to the population. Further, no precise 
meaning has been assigned to the term error. These topics will be elaborated 
at a later stage. 


1.5. Parameters and Estimates 


A clear distinction is usually drawn between parameters and estimates. 
A parameter is a property descriptive of the population. The term estimate 
refers to a property of a sample drawn at random from а population. The 
sample value is presumed to be an estimate of a corresponding population 
parameter. Suppose, for example, that a sample of 1,000 adult male 
Canadians of a given age range is drawn from the total population, the height 
of the members of the sample measured, and a mean value, 68.972 in., 
obtained. This value is an estimate of the population parameter which 
would have been obtained had it been possible to measure all the members in 
the population. Usually parameters or population values are unknown. 
We estimate them from our sample values. The distinction between 
parameter and estimate reflects itself in statistical notation. A widely used 
convention in notation is to employ Greek letters to represent parameters 
and Roman letters to represent estimates. Thus the symbol с, the Greek 
letter sigma, may be used to represent the standard deviation in the popula- 
tion, the standard deviation being a commonly used measure of variability. 
The symbol s may be used as an estimate of the parameter с. This conven- 
tion in notation is applicable only within broad limits. By and large we shall 
adhere to this convention in this book, although in certain instances it will 
be necessary to depart from it. 


1.6. Variables and Their Classification 


The term variable refers to a property whereby the members of a group or 
set differ one from another. The members of a group may be individuals 
and may be found to differ in sex, age, eye color, intelligence, auditory acuity, 
reaction time to a stimulus, attitudes toward a political issue, and in a 


Sec. 1.6] Basic Ideas in Statistics 9 


thousand other ways. Such properties are variables. The term constant 
refers to a property whereby the members of a group do not differ one from 
another. Ina sense а constant is a particular type of variable; it is a variable 
which does not vary from one member of a group to another or within a 
particular set of defined conditions. 

Labels or numerals may be used to describe the way in which one member 
of a group is the same as or different from another. With variables like sex, 
racial origin, religious affiliation, and occupation, labels are employed to 
identify the members which fall within particular classes. An individual 
may be classified as male or female; of English, French, or Dutch racial 
origin; Protestant or Catholic; a shoemaker or a farmer; and so on. The 
label identifies the class to which the individual belongs. Sex for most 
practical purposes is a two-valued variable, individuals being either male or 
female. Occupation, on the other hand, is a multivalued variable. Any 
particular individual may be assigned to any one of a large number of classes. 
With variables like height, weight, intelligence, and so on, measuring opera- 
tions may be employed which enable the assignment of descriptive numerical 
values. An individual may be 72 in. tall, weigh 190 Ib, and have an IQ of 90. 

The particular values of a variable are referred to as variates, or variate 
values. To illustrate, in considering the height of adult males, height is the 
variable, whereas the height of any particular individual is a variate, ог 
variate value. 

In dealing with variables which bear a functional relationship one to 
another the distinction may be drawn between dependent and independent 


variables. Consider the expression 
Y = Ј(Х) 


This expression says that a given variable У is some unspecified function of 
another variable X. ‘The symbol f is used generally to express the fact that 
a functional relationship exists, although the precise nature of the relationship 
is not stated. In any particular case the nature of the relationship may be 
known; that is, we may know precisely what f means. Under these cir- 
lue of X a corresponding value of Y can be 


cumstances, for any given уа 1 н k 
knowledge of the functional relationship, 


calculated; that is, given X and a e d t 
Y can be predicted. It is customary to speak of У, the predicted variable, 


as the dependent variable because the prediction of it depends on the value 
of X and the known functional relationship, whereas X is spoken of as the 


independent variable. Given an expression of the kind Y = X? for any 


given value of X, an exact value of Y can readily be determined. Thus if 
X is known, Y is also known exactly. Many of the functional relationships 
found in statistics permit probabilistic and not exact prediction to occur. 
Such relationships may provide the most probable value of y for any given 
value of X, but do not permit the making of perfect predictions. 


10 Stalistical Analysis in Psychology and Education [СнаР. 1 


A distinction may be drawn between continuous and discrete (or discon- 
tinuous) variables. A continuous variable may take any value within a 
defined range of values. The possible values of the variable belong to a 
continuous series. Between any two values of the variable an indefinitely 
large number of in-between values may occur. Height, weight, and chrono- 
logical time are examples of continuous variables. А discontinuous or 
discrete variable can take specific values only. Size of family is a discon- 
tinuous variable. A family may be comprised of 1, 2, 3 or more children, 
but values between these numbers are not possible. The values obtained in 
rolling a die are 1, 2, 3, 4, 5, and 6. Values between these numbers are not 
possible. Although the underlying variable may be continuous, all sets of 
real data in practice are discontinuous or discrete. ‘Convenience and errors 
of measurement impose restrictions on the refinement of the measurement 
employed. 

Another classification of variables is possible which is of some importance 
and is of particular interest to statisticians. This classification is based on 
differences in the type of information which different operations of classifica- 
tion or measurement yield. To illustrate, consider the following situations. 
An observer using direct inspection may rank order a group of individuals 
from the tallest to the shortest according to height. On the other hand, he 
may use a foot rule and record the height of each individual in the group in 
feet and inches. These two operations are clearly different, and the nature 
of the information obtained by applying the two operations is different. The 
former operation permits statements of the kind: individual А is taller or 
Shorter than individual B. The latter operation permits statements of how 
much taller or shorter one individual is than another. Differences along 
these lines serve as a basis for a classification of variables, the class to which a 
variable belongs being determined by the nature of the information made 
available by the measuring operation used to define the variable. Four 
broad classes of variables may be identified. "These are referred to as (1) 
nominal, (2) ordinal, (3) interval, and (4) ratio variables. This Classification 
is discussed in some detail by Stevens (1951, Chap. 1). A recent and 
very interesting discussion relevant to this topic is given in Torgerson 
(1958). 

A nominal variable is a property of the members of a group defined by an 
operation which permits the making of statements only of equality or dif- 
ference. Thus we may state that one member is the same as or different from 
another member with respect to the property in question. Statements 
about the ordering of members, or the equality of differences between 
members, or the number of times a particular member is greater than or less 
than another are not possible. To illustrate, individuals may be classified 
by the color oftheireyes. Coloris a nominal variable. The statement that 
an individual with blue eyes is in some sense "greater than" or “less than” 


 — 


Sec. 1.6] Basic Ideas in Statistics 11 


an individual with brown eyes is meaningless. Likewise the statement 
that the difference between blue eyes and brown eyes is equal to the dif- 
ference between brown eyes and green eyes is meaningless. The only kind of 
meaningful statement possible with the information available is that the eye 
color of one individual is the same as or different from the eye color of another. 
A nominal variable may perhaps be viewed as a primitive type of variable, 
and the operations whereby the members of a group are classified according 
to such a variable constitute a primitive form of measurement. In dealing 
with nominal variables numerals may be assigned to represent classes, but 
such numerals are labels, and the only purpose they serve is to identify the 
members within a given class. 

An ordinal variable is a property defined by an operation which permits 
the rank ordering of the members of a group; that is, not only are statements 
of equality and difference possible, but also statements of the kind greater 
than or less than. Statements about the equality of differences between 
members or the number of times one member is greater than or less than 
anotherarenot possible. Ifa judgeis required to order a group of individuals 
according to aggressiveness, or cooperativeness, or some other quality, the 
resulting variable is ordinal in type. Many of the variables used in psy- 
chology are ordinal. 

A distinction may be made between two types of ordinal variables, those 
with a natural origin, or “zero” point, and those without a natural origin 
(Torgerson, 1958). Ап ordering of pupils on intelligence by a teacher is an 
ordinal variable without a natural origin. On ordering a set of stimuli 
according to their pleasantness, the point of transition from unpleasant to 
pleasant may be taken as a natural origin. 

An interval variable is a property defined by an operation which permits 
the making of statements of equality of intervals, in addition to statements 
of sameness or difference or greater than or less than. Thus we may state 
that the difference between individuals A and B is equal to the difference 
between individuals Band C. Ап interval variable does not have a true zero 
point, or natural origin, although in many cases a zero point may for con- 
venience be arbitrarily defined. Temperature as measured by a centigrade 
or Fahrenheit thermometer and calendar time are examples of interval 
variables. x f - 

A ralio variable is a property defined by an operation which permits the 
making of statements of equality of ratios in addition to all the other kinds 
of statements discussed above. This means that one variate value may be 
spoken of as double or triple another, and so on. An absolute zero is always 
implied. The numbers used represent distances from a natural origin. 
Length, weight, and the numerosity of aggregates are examples of ratio 
variables. In psychological work variables which conform to the rigorous 
requirements of ratio variables are not numerous. Scales for measuring 


12 Statistical Analysis in Psychology and Education — [Снлр.1 


loudness, pitch, and other variables have been developed by Stevens and his 
associates (1957) at Harvard. These appear to satisfy all the conditions of 
ratio variables. 

Some writers distinguish between quantitative and qualitative variables 
without being explicit about the nature of this distinction. In the present 
classificatory system nominal and ordinal variables may be spoken of as 
qualitative, and interval and ratio variables as quantitative. 

Most statistical methods have been developed for the handling of problems 
involving interval and ratio variables. A method which is appropriate in 
dealing with one type of variable may not be appropriate with another. In 
practice, however, we frequently apply procedures appropriate to one type 
of variable to problems involving other classes of variables. This means that 
we either discard information which we do in fact possess or assume that we 
have information which we do not possess. An example of this latter type of 
situation arises frequently with rank-order data. The members of a group 
are ordered. Our information consists of relationships greater than or less 
than, and these are described by a set of ordinal numbers; thus one member 
is first, another second, and so on. It is a common practice to replace such a 
set of ordinal numbers by the corresponding set of cardinal numbers, 1, 2; 
3, . . . , N, and to proceed then to apply arithmetical operations to these 
numbers. This means that certain assumptions are made, Information is 
superimposed on the data which the measuring operation did not yield; that 
is, for computational purposes we assume we are in possession of information 
which actually we do not have. In the above instance we are making an 
assumption about equality of intervals when in fact the measuring operation 
employed does not yield information of this kind. The assumption is that 
the difference between the first and second individual is equal to the difference 
between the second and third, and so on. In psychological work many 
variables are either nominal or ordinal. For example, scores on intelligence 
tests, attitude scales, personality tests, and the like, are in effect ordinal 
variables. We cannot say, for example, that the difference between an IQ 
of 80 and an IQ of 90 is in any sense equal to the difference between an IQ of 
110 and an IQ of 120. None the less such variables are frequently treated 
by methods which, from a rigorous logical viewpoint, are appropriate only to 
interval and ratio variables. The suggestion is not made here that the 
practice of assuming that we have information we do not have, or the converse 
practice of discarding information we do in fact have, be discontinued, 
although a logical puritan might be led to this position. Frequently practical 
necessity dictates a particular procedure. Nevertheless it is a matter of 
some importance to know the nature of the information contained in the data. 
We should be able to distinguish clearly between this and the information 
either imposed or discarded for the purpose of making some process of calcu- 
lation possible. In other words, our understanding of precisely what we are 


Sec. 1.7] Basic Ideas in Statistics 13 


doing is enriched by knowing the nature of the assumptions made at each 
stage in the application of any procedure. 


1.7. On Calculating 


If possible, skill in the operation of a calculating machine should be acquired 
аї ап early stage in the study of statistics. Many of the statistical problems 
which present themselves in experimental work in psychology involve much 
computation, and without a calculator the arithmetical labor is prohibitive. 
Skill in the operation of a calculator can be readily acquired, a reasonable 
level of performance on simple operations being attained by most students in 
a few hours of practice. Not only can the simple arithmetical operations of 
addition, subtraction, multiplication, and division be rapidly performed on 
many of the widely used calculating machines, but also many short cuts and 
combinations of operations are possible. For example, the sum of products 
of two sets of variate values ХХУ may be accumulated. The value of the 
term XXY is required in the calculation of the correlation coefficient, a 
statistic which provides a measure of the relationship between two variables. 
Statistical procedures can frequently be adapted to suit the capabilities of 
particular machines. The calculation of a square root is usually not an 
efficient operation on most calculators. Square roots can be more quickly 
obtained by consulting a table of square roots or by direct calculation. A 
valuable aid in computing is Barlow's Tables (Comrie, 1947). These tables 
were originally prepared by Peter Barlow at the Royal Military Academy, 
Woolwich, and first published in 1814. The 1941 edition of the Tables 
provides the square, cubes, square roots, cube roots, and reciprocals of all 
integers up to 12,500. 

In computing, the importance of adequate checks on the accuracy of the 
calculation cannot be too emphatically stressed. Every calculation should 
be checked either by repetition or by the employment of some checking device 
which guarantees accuracy. There is no substitute for accuracy. The 
conduct of an experiment serves no purpose unless correct inferences are 
drawn from the data. The correctness of the inferences drawn cannot be 
assured unless the statistical procedures employed are appropriate to the 
data and unless these procedures are accurately applied. Students not 
infrequently feel that the statistical analysis of a set of data is laborious and 
time-consuming and in their haste to arrive at some kind of result may dis- 
regard checks which are necessary to ensure the accuracy of their calculations. 
When tempted in this direction, the student should observe that the time 
spent in the proper statistical analysis of a set of data represents in most 
instances a small proportion of the time required to plan the experiment and 
gather the data. A slipshod analysis may throw in jeopardy the total 
investment of time and effort. 


14 Slalistical Analysis in Psychology and Education [СнАр. 1 
1.8. Units of Measurement 


When dealing with continuous variables a unit of measurement may be 
regarded as any defined subdivision of a scale, however fine. In measuring 
length the units may be inches, yards, and miles or centimeters, meters, and 
kilometers. In measuring weight the units may be ounces and pounds or 
grams and kilograms. In measuring chronological time the units may be 
days, months, or years. 

With continuous variables, although all values are theoretically possible 
within any range of values, we select a unit of measurement and record our 
observations as discrete values. All experimental observations, however 
obtained, are recorded as discrete values. Thus the length of a desk or the 
height of a man may be measured to the nearest inch, or tenth of an inch, or 
hundredth of an inch, the unit of measurement in each case being 1 in., 
ro in., or тўр in., respectively, and the number of such units involved in 
any particular measurement must, of necessity, be recorded as a discrete 
number. 

The fineness of the unit of measurement employed is determined by the 
accuracy which the nature of the situation demands or by the accuracy which 
the instrument of measurement allows, or both. In the measurement of 
time intervals, for example, great accuracy can be obtained by the use of 
appropriate measuring devices. In measuring the time required for a child to 
solve a problem it is certainly adequate for all practical purposes to record 
the observation in seconds. In reaction-time experiments, however, we may 
require a unit of measurement of a hundredth or perhaps a thousandth part 
ofasecond. Further, the unit should reflect the accuracy of the measuring 
operation. To illustrate, an intelligence quotient is calculated by dividing 
mental age by chronological age, both expressed in months, and multiplying 
by 100. Quite clearly, we could speak of a child's intelligence quotient as 
being 103.3, or 103.23, or something of the sort. Such an attempt at accuracy 
would be spurious because of the large error of measurement which is known 
to attach to the intelligence quotient. In practice, intelligence quotients 
are always recorded to the nearest whole number. 

When we record measurements of a continuous variable as discrete numbers 
in so many units, we imply in most cases that had a more accurate form of 
measurement been used, were this possible and desirable, the value thereby 
obtained would fall within certain limits, these limits being defined as one-half 
a unit above and below the value reported. Thus when we report a measure- 
ment to the nearest inch, say, 26 in., this is assumed to mean that the observa- 
tion falls within the limits 25.5 and 26.5, or more precisely that it is greater 
than or equal to 25.5 and less than 26.5. Likewise, a measurement made to 
the nearest tenth part of an inch, say, 31.7, is assumed to fall within the limits 
31.65 and 31.75. In a reaction-time experiment a particular observation 


Sec. 1.9] Basic Ideas in Statistics 15 


measured to the nearest hundredth of a second might be, say, .196 sec. 
This means that the measurement is taken as falling within the limits 
.1955 and .1965 sec. 

An exception to the above is age. When we state that a person is 18 years 
old, we do not mean in conventional usage that his age falls within the limits 
17 years 6 months and 18 years 6 months. А person is ordinarily spoken of 
as 18 years old until his 19th birthday. His age is greater than or equal to 
18 years and less than 19 years. Similarly to state that a person is 126 
months old means that he is greater than or equal to 126 months and less 
than 127 months. Definitions of age other than the above are used for 
particular purposes. 

Questions of the above kind do not, of course, arise with discrete variables. 
The number of animals in a cage, or children in a classroom, or teeth in a 
child's head are discrete observations, and to imply a range of values within 
which any particular observation is assumed to fall is not meaningful. 


1.9. Summation Notation 


Statistical notation is a language with its own grammatical rules. One 
of the more frequently used forms of notation is spoken of as summation 
notation. Some familiarity with this class of notation should be acquired 
as early as possible in the study of statistics. 

Let X bea variable and Xs, Xs, . . . , Xx a set of variate values. To 
illustrate, X might refer to a measure of the activity of ratsina maze. The 
symbols Xi, Хз, .- +, Ху would then refer to measures of activity for 
individual rats, there being N rats in the group. The sum X;-+ Х + * * * 
+ Ху, that is, all the individual measures added together, may be written as 


N 
Xi 
=1 
N 
Thus у x= X Xa ++ + XN (1.1) 


i=l 


The symbol X is the Greek capital letter sigma and refers to the simple 
operation of adding things up- In the language of mathematics it is a verb. 
The symbols above and below the summation sign define the Шаа of the 


summation. In a sense they function as adverbs. Thus » means the 
=1 

addition of all values formed by assigning to i the values of every positive 

integer from i= 1 toi = N, inclusive. For example, let the numbers 10, 


12, 19, 21, 32 be measures of activity for a group of five rats. The sum of 


16 Statistical Analysis in Psychology and Education — [Cn». 1 


5 
these five scores may be represented symbolically by b X; and in this case 
s ] i=l 

is equal to 94. Where the limits of the summation are clearly understood 
from the context, which is very frequently the case, it is customary to omit 
the notation above and below the summation sign and write ZX;, or simply 
ХХ. 

There аге a number of very simple theorems which are useful in handling 
problems involving summation notation. 

Theorem 1. If every variate value in a group is multiplied by a constant 
number or factor, that factor may be removed from under the summation 
sign and written outside as a factor. Thus 


CX; = cX4 + cXe+ ev IM + cXy 
= ¢(Xi+ X:+ ++ + Xv) 
N 


=c) X (1.2) 


i=l 


This means that if we multiply each one of the measures 10, 12, 19, 21, 32 
by any constant, say, 5, the sum of the resulting measures will be given 
directly by 5 X 94. 


Theorem 2. The summation of a constant over N terms is equal to Ne. 
Thus 


N 


en lan Ye 


= Ne (1.3) 


If c = 5 and N equals 4, it is obvious that 5 + 5 + 5 + 5 = 4 X 5 = 20. 
Theorem 3. The summation of the sum of any number of terms is the sum 
of the summations of these terms taken separately. Thus 


N 
Ў аъ) е хаа РАФ: 
=1 


N N N 
+ Xr +P = Y Xit Y Yee уд (14) 


Theorem 4. The sum of the first N integers is 


N(N + 1) 


Sec. 1.9] Basic Ideas in Statistics 17 


Consider the integers 1, 2, 3, ..., (V — 2), (N — 1), №. It is observed 
that the sum of the first and last integers in the series is equal to N + 1, the 
sum of the second and the second from the last is equal to N + 1, and so on. 
In any series where № is even, there are N/2 such pairs, and the sum of the 
series is given by N(N + 1)/2. Where N is odd, there are (N — 1)/2 such 
pairs, plus the middle term, which is equal to (N + 1)/2. The sum of the 
series is then 


(N — 1) 


(WV +1) _ NW +1) 
a AEDH = 


2 2 


N 
An expression frequently encountered in statistics is у X,Y; This refers 
i=l 
to the sum of the products of two sets of paired numbers. If, for example, 
5, 6, 12, 15 are the scores, X, of four people on a test, and 2, 3, 7, 10 are the 
N 


scores, Y, of the same four people on another test, then Р) X,Y; refers to the 


i=l 
sum of products and is equal to 5X2+6X3+12X 7+ 15 X 10, 
or 262. 

The notation used in elementary statistics is simple, and skill in its manipu- 
lation can be acquired with a little practice. А good understanding of the 
nature of statistical method and its applications can be acquired with very 
little in the way of mathematical training at all. A little knowledge of 
arithmetic and a little elementary algebra go a long way in the study of 
statistics. 


EXERCISES 


1. Indicate with examples the differences between (a) population and sample, (b) finite and 
infinite populations, (c) descriptive and sampling statistics, (d) parameters and esti- 
mates, (e) dependent and independent variables, (f) continuous and discrete variables, 
(g) quantitative and qualitative variables. 

2. Classify the following as nominal, ordinal, interval, or ratio variables: (а) marks on a 
university examination, (b) age of school children, (c) eye color, (d) sex, (е) reaction 
time, (f) racial origin, (g) ratings of scholastic success, (/) calendar time. 

. Write the following in summation notation: 


w 


(a) Xi X:+ ‘۰° +X 

() Vit Yat +++ + Yr 

(с) (Xi + YD + (Xa + Ya) + °°° + (X; + Y) 
(d) Х.У, + Ха + +++ + Ху 

(е) Хү, + ХҮ, 43:7 + Xy'¥y 

DN (Xı +e) + (Xs +O) + °°° + (Xs + e) 

( сХ,+‹Х«+ +++ + СХ 

(h) Xi/c + Х/с+ +++ + Xfce 

(i) cXfYı + ‹ХФҮ, + + + + + сХмҮх 


18 Statistical Analysis in Psychology and Educalion — [Cuav. 1 
4. Write each of the following in full: 


2 3 
(a) у Xi (d) с X X#Y; 
i=l i=l 
(b) ў Xd (е) у Xj + 4с 
i=l ar 
д Ў QG + Y) 0 D X:+ n 


s sim Facere area Ў о 
i=l 


6. Which of the following are init and which are false? 


o lnin- Y xx 


i=l i=l 


o y- + 


© Ў ад X ха 


i N pe N N 
(d) ў (xi 4 У)? = ye Xa+ 2 Yi42 >, XY, 


7. What is the sum of the first 100 integers? 


CHAPTER 2 


FREQUENCY DISTRIBUTIONS AND THEIR 
GRAPHIC REPRESENTATION 


2.1. Introduction 


'The data resulting from experiments are usually collections of numbers. 
Classification and description of these numbers are required to assist interpre- 
tation. Advantages attach to the classification of data in the form of 
frequency distributions. Such classification assists a comprehension of 
important properties of the data and may reduce the arithmetical labor in 
calculating certain statistics. A frequency distribution is an arrangement of 
data which shows the frequency of occurrence of the different values of the 
variable or defined groupings of the values of the variable. 


2.2. Classification of Data 


Consider the data in Table 2.1. These are the intelligence quotients of 
100 children obtained from a psychological test. As a first step in the 
direction of classification we may rank order the 100 intelligence quotients 
in order of magnitude, proceeding from the largest to the smallest as shown 
in Table 2.2. An arrangement of this kind is called a rank distribution. 
Such an arrangement of data has few advantages. Inspection of the rank 
data, however, shows that many scores occur more than once; thus there are 
five 103’s, three 100’s, and so on. This suggests that the data might be 
arranged in columns, as shown in Table 2.3, one column listing the possible 
scores and the other listing the number of times each score occurs. Such an 
arrangement of data is a frequency distribution, and the number of times a 
particular score value occurs is a frequency, represented by the symbol /. 

In Table 2.3 the data have been classified in as many classes as there are 
score values within the total range of the variable. The number of classes 
is large. Usually it is advisable to reduce the number of classes by arranging 
the data in arbitrarily defined groupings of the variable; thus all scores 
within the range 65 to 69, that is, all scores with the values 65, 66, 67, 68, 
and 69, may be grouped together. All scores within the ranges 70 to 74, 75 
to 79, and so on, may be similarly grouped. Such groupings of data are 
usually done by entering a tally mark for each score opposite the range of the 

19 


20 Statistical Analysis in Psychology and Education [Cuar. 2 


TABLE 2.1* 

INTELLIGENCE QUOTIENTS MADE BY 100 Pupits ON A MENTAL TEST 
109 111 82 105 134 
113 90 79 100 117 
80 90 121 75 93 
99 90 92 96 82 
101 104 80 81 83 
104 93 109 72 110 
111 91 109 111 81 
122 83 92 101 77 
99 103 93 91 67 
108 93 84 84 100 
102 84 96 89 81 
107 95 91 107 102 
109 93 82 103 116 
86 78 73 104 104 
103 108 76 94 108 
72 87 121 80 127 
105 103 106 119 90 
93 89 110 103 100 
99 79 117 114 117 
93 82 98 89 119 


* Tables 2.1 to 2.6 are reproduced from R. W. B. Jackson and George A. Ferguson, 
Manual of educational. statistics, University of Toronto, Department of Educational Re- 
search, Toronto, 1942. 


TABLE 2.2 
RANK DISTRIBUTION OF INTELLIGENCE QUOTIENTS SHOWN iN TABLE 2.1 
134 109 102 93 2 
127 109 101 92 82 
122 108 101 92 82 
121 108 100 91 82 
121 108 100 91 81 
119 107 100 91 81 
119 107 99 90 81 
117 106 99 90 80 
117 105 99 90 80 
117 105 98 90 80 
116 104 96 89 79 
114 104 96 89 79 
113 104 95 89 78 
11 104 94 87 71 
111 103 93 86 76 
111 103 93 84 75 
110 103 93 84 73 
110 103 93 84 72 
109 103 93 83 72 
109 102 93 83 67 


Sec. 2.2] Frequency Distributions and Graphic Representation 
TABLE 2.3 


FREQUENCY DISTRIBUTION OF INTELLIGENCE QUOTIENTS OF TABLE 2.1 WITH AS 


Score 


134 
133 
132 
131 
130 


129 
128 
127 
126 
125 


124 
123 
122 
121 
120 


119 
118 


FREQUENCY DISTRIBUTION OF THE INTELLIGENCE QUOTIENTS OF TABLE 2.1 


/ 


“ 


| 


{йәә 


Many CLASSES AS Score VALUES 


Score 


117 
116 


/ Score 
3 100 
1 99 
— 98 
1 97 
1 96 
-— 95 
3 94 
2 93 
4 92 
3 91 
2 90 
1 89 
2 88 
4 87 
5 86 
2 85 
2 84 
TaBLE 2.4 


f 


3 
3 
1 
2 


مم مم بد N‏ دي 


Score 


————— 


Ы Class Tally Frequency 
interval 

130-134 l 1 
125-129 / H 
120-124 Ill 3 
115-119 W I 6 
110-114 N II 7 
105-109 М M II 12 
100-104 MMM 1 16 
95-99 № Il 7 
90-94 МММ I 17 
85-89 № 5 
80-84 TH ШП 15 
75-79 М | 6 
70-74 Ill 3 
65-69 [| 1 


© 


Б 


A 


21 


22 Statistical Analysis in Psychology and Education [Cuar. 2 


variable within which it falls and counting these tally marks to obtain the 
number of cases within the range. This procedure is shown in Table 2.4. 

The range of the variable adopted is called the class interval. In the 
illustration in Table 2.4 the class interval is 5. This arrangement of data 
is also a frequency distribution, and the number of cases falling within each 
class interval is a frequency. The only difference between Tables 2.3 and 2.4 
is in the class interval, which is 1 in the former case and 5 in the latter. 


2.3. Conventions regarding Class Intervals 


In the arrangement of data with a class interval of 1, as shown in Table 2:3, 
the original observations are retained and may be reconstructed directly from 
the frequency distribution without loss of information. Where the class 
interval is greater than 1, say, 2, 5, or 10, some loss of information regarding 
individual observations is incurred; that is, the original observations cannot 
be reproduced exactly from the frequency distribution. If the class interval 
is large in relation to the total range of the set of observations, this loss of 
information may be appreciable. If the class interval is small, the classifica- 
tion of data in the form of a frequency distribution may lead to very little 
gain in convenience over the utilization of the original observations. 

The rules listed below are widely used in the selection of class intervals. 
These rules lead in most cases to a convenient handling of the data. 

1. Selecta class interval of such a size that between 10 and 20 such intervals 
will cover the total range of the observations. For example, if the smallest 
observation in a set were 7 and the largest 156, a class interval of 10 would be 
appropriate and would result in an arrangement of the data into 16 intervals. 
1f the smallest observation were 2 and the largest 38, a class interval of 3 
would result in an arrangement of 14 intervals. If the observations ranged 
from 9 to 20, a class interval of 1 would be convenient. 

2. Select class intervals with a range of 1, 2, 3, 5, 10, or 20 points. These 
will meet the requirements of most sets of data. 

3. Start the class interval at a value which is a multiple of the size of that 
interval. Forexample, witha class interval of 5, the intervals should start with 
the values 5, 10, 15, 20, etc. With a class interval of 2, the intervals should 
start with the values 2, 4, 6, 8, 10, etc. This is, of course, highly arbitrary. 

4. Arrange the class intervals according to the order of magnitude of the 
observations they include, the class interval containing the largest observa- 
tions being placed at the top. 


2.4. Exact Limits of the Class Interval 


Where the variable under consideration is continuous, and not discrete, 
we select a unit of measurement and record our observations as discrete 


D 


23 


values. Where we record an observation in discrete form and the variable 
is a continuous one, we imply that the value recorded represents a value 
falling within certain limits. These limits are usually taken as one-half a 
unit above and below the value reported. Thus when we report a measure- 
ment to the nearest inch, say, 16 in., we mean that, if a more accurate form 
of measurement had been used, the value obtained would fall within the 
limits 15.5 and 16.5 in. Similarly, a measurement made to the nearest 
tenth part of an inch, say, 31.7 in., is understood to fall within the limits 
31.65 and 31.75 in. In a reaction-time experiment a particular observation 
measured to the nearest hundredth of a second might be, say, .196 sec. This 
assumes that had a more accurate timing device been used, the measurement 
would have been found to fall somewhere within the limits .1955 and .1965 sec. 


Src. 2.4] Frequency Distributions and Graphic Representation 


TABLE 2.5 
CLASS INTERVALS, Exact LIMITS, AND MID-POINTS FOR FREQUENCY DISTRIBUTION 
OF INTELLIGENCE QUOTIENTS 


(2) (3) (4) 


(1) 


Class interval 


Exact limits 


Mid-point of 
interval 


Frequency 


130-134 129.5-134.5 132.0 1 
125-129 124.5-129.5 127.0 1 
120-124 119.5-124.5 122.0 3 
115-119 114.5-119.5 117.0 6 
110-114 109.5-114.5 112.0 7 
105-109 104.5-109.5 107.0 12 
100-104 102.0 16 
95-99 97.0 7 
90-94 92.0 17 


85-89 87.0 


Class intervals are usually recorded to the nearest unit and thereby reflect 
the accuracy of measurement. For various reasons it is frequently necessary 
to think in terms of so-called exact limits of the class interval. These are 
sometimes spoken of as class boundaries, or end values, and sometimes as 
real limits. Consider the class interval 95 to 99 in Table 2.4. We grouped 
within this interval all measurements taking the values 95, 96, 97, 98, and 99. 
The limits of the lower value are 94.5 and 95.5, while those of the upper value 
are 98.5 and 99.5. The total range, or exact limits, which the interval is 


24 Statistical Analysis in Psychology and Education — [Cuv 2 


presumed to cover is then clearly 94.5 and 99.5, which means all values 
greater than or equal to 94.5 and less than 99.5, 

The above discussion is applicable to continuous variables only. With 
discrete variables no distinction need be made between the class interval 
and the exact limits of the interval, the two being identical. 

Table 2.5 shows the frequency distribution of the intelligence quotients of 
Table 2.1. Column 1 shows the class interval as usually written, while 
col. 2 records the exact limits. In practice, of course, the exact limits are 
rarely recorded as in Table 2.5. 


2.5. Distribution of Observations within the Class Interval 


The grouping of data in class intervals results in a loss of information 
regarding the individual observations themselves. Scores may differ one 
from another within a limited range, and yet all be grouped within the same 
interval. In the calculation of certain statistics and in the preparation of 
graphs it becomes necessary to make certain assumptions regarding the values 
within the intervals. Two separate assumptions may be made, depending on 
the purposes we have in mind. 

The first assumption states that the observations are uniformly distributed 
over the exact limits of the interval. This assumption is made in the calcu- 
lation of such statistics as the median, quartiles, and percentiles and in the 
drawing of histograms. In Table 2.5 it will be observed that 16 cases fall 
within the interval 100 to 104, which has the exact limits 99.5 to 104.5. The 
assumption states that these 20 cases are distributed over the interval as 
follows: 


Interval Frequency 
103.5-104.5 3.2 
102.5-103.5 3.2 
101.5-102.5 3.2 
100.5-101.5 3.2 
99.5-100.5 3.2 
A BE e 


The second widely used assumption states that all the observations are 
concentrated at the mid-point of the interval, that is, that all the observa- 
tions for that interval are the same and equal to the value corresponding to 
the mid-point of the interval. The mid-point of any class interval is halfway 
between the exact limits of the interval. In the above example the mid-point 
of the interval 99.5 to 104.5 is 102. This second assumption is made in the 
calculation of such statistics as means, standard deviations, and in the draw- 
ing of frequency polygons. 

The determination of the mid-point of a class interval should present no 
difficulty. The mid-point may be conveniently obtained by adding one- 


Sec. 2.6] Frequency Distributions and Graphic Representation 25 


half of the range of the class interval to the lower exact limit of that interval. 
Thus with the interval 100 to 104 the lower limit is 99.5 and one-half the class 
interval is 2.5. The mid-point is therefore 99.5 + 2.5, or 102. Consider a 
10-point class interval written in the form 100 to 109. Here the lower limit 
is 99.5 and one-half the class interval is 5. The mid-point is then 99.5 + 5, 
or 104.5. Table 2.5, col. 3, shows the mid-points of the corresponding class 
intervals. 


2.6. Cumulative Frequency Distributions 


Situations occasionally arise where our concern is not with the frequencies 
within the class intervals themselves, but rather with the number or per- 
centage of values “greater than” ог “less than” a specified value. Such 
information may be made readily available by the preparation of a cumula- 
tive frequency distribution. The cumulative frequencies are obtained by 
adding successively the individual frequencies. Thus if the individual 
frequencies are denoted by fi fe, fs; . . . , fi, the cumulative frequencies 
are fi, fit fs, fit fs + fs, and so оп. Table 2.6 shows the cumulative 
frequencies and cumulative percentages for a distribution of intelligence 
quotients. 

TABLE 2.6 


CuMULATIVE FREQUENCIES AND CUMULATIVE PERCENTAGE FREQUENCIES FOR 
DISTRIBUTION OF INTELLIGENCE QUOTIENTS 


(1) (2) (3) (4) 
Class interval Cumulative Gs 
(IQ's) Frequency frequency percentage 
frequency 
130-134 1 106 100.0 
125-129 3 105 99.1 
120-124 4 102 96.2 
115-119 10 98 92.5 
110-114 8 88 83.0 
105-109 15 80 15.5 
100-104 20 65 61.3 
95-99 14 45 42.5 
gy 11 31 29.2 
85-89 8 20 18.9 
n 6 12 11.3 
75-79 5 6 5.7 
70-74 0 1 9 
65-69 1 1 9 
Totali 74.9. st 106 
crc ie EO HAN RR pm? 


26 Slalislical Analysis in Psychology and Education [СнаР. 2 


2.7. Tabular Representation 


Statistical data are frequently arranged and presented in the form of 
tables. Such tables should be designed to enable the reader to grasp with 
minimal effort the information which they intend to convey. While very 
considerable variety in the design of statistical tables is possible, a number of 
general rules should be observed. Kenney (1954) lists six such rules, and 
these are as follows:! 


1. Every table must be self-explanatory. То accomplish this the title should be 
short, but not at the expense of clearness. 

2. Full explanatory notes, when necessary, should be incorporated in the table, 
either directly under the descriptive title and before the body of the table, or else 
directly under the table. 

3. The columns and rows should be arranged in logical order to facilitate com- 
parisons, 

4. In tabulating long columns of figures, space should be left after every five or 
ten rows. Long unbroken columns are confusing, especially when one is comparing 
two numbers in a row but in widely separated columns. 

5. If the numbers tabulated have more than three significant figures, the digits 
should be grouped in threes. Thus, one should write 4 685 732, not 4685732. 

6. Double lines at the top (or at the top and bottom) may enhance the effective- 
ness of a table. If the table nicely fills the width of a page, no side lines should be 
used. In such cases the omission of the side lines will have the tendency to empha- 
size the other vertical lines and cause the interior columns to stand out better. 
The columns should not be widely separated, and the form of a narrow, compact 
table should have its side lines. 


"Tables presented as part of a manuscript should be appropriately num- 
bered and should be inserted where possible in close proximity to the place 
where they are referred to in the text, otherwise the reader is put to some 
inconvenience. 

The appropriate design of statistical tables can become a matter of some 
complexity. This is particularly the case where it is necessary to present 
data which are cross-classified in a variety of ways. 


2.8. Graphic Representation of Frequency Distributions 


Graphic representation is often of great help in enabling us to comprehend 
the essential features of frequency distributions and in comparing one 
frequency distribution with another. A graph is the geometrical image of a 
set of data. It is a mathematical picture. It enables us to think about a 


! Reproduced, with permission, from John F. Kenney and E. S. Keeping, Mathematics 


of statistics, part 1, 3d ed., copyright 1954, D. Van Nostrand Company, Inc., Princeton, 
NJ. 


Sec. 2.9] Frequency Distributions and Graphic Represenlalion 27 


problem in visual terms. Graphs are used not only in the practical handling 
of real sets of data, but also as visual models in thinking about statistical 
problems. Many problems can be reduced to visual form, and such reduction 
often facilitates their understanding and solution. Graphs have become a 
part of our everyday activity. Newspapers, popular magazines, trade 
publications, business reports, and scientific periodicals use graphic repre- 
sentation extensively. Graphic representation has been carefully studied, 
and much has been written on the subject. For a more detailed account 
than is given here, see Johnson and Jackson (1953, Chap. 3). While graphic 
representation has many ramifications, we shall consider here only those 
aspects of the subject which are useful in visualizing the important properties 
of frequency distributions and the ways in which one frequency distribution 
may differ from another. 


2.9. Histograms 


A histogram is a graph in which the frequencies are represented by areas 
in the form of bars. Table 2.7 presents measures of auditory reaction time 
for a sample of 188 subjects. 


TABLE 2.7 
FREQUENCY DISTRIBUTION OF AUDITORY REACTION TIMES FOR A SAMPLE OF 
188 UNIVERSITY OF CHICAGO UNDERGRADUATES* 


Class interval, | Mid-point Бесе Cumulative 
sec of interval | ^. 4 y frequency 
.34-.35 .345 2 188 
.32-.33 .325 2 186 
.30-.31 .305 4 184 
.28-.29 .285 5 180 
.26-.27 .265 1 175 
.24-.25 .245 17 164 
.22-.23 ,245 28 147 
.20-.21 .205 69 119 
.18-.19 .185 37 50 
,16-.17 .165 12 13 
114.15 .145 1 1 
Tol it MÀ 188 


А 
* Adapted from L. L. Thurstone, A factorial study of perception, University of Chicago 
Press, Chicago, 1944. 


Figure 2.1 shows the frequencies plotted in the form of a histogram. То 
prepare such a histogram proceed as follows. Obtain a piece of suitably 
cross-sectioned graph paper. Paper subdivided into tenths of an inch with 
heavy lines 1 in. apart is convenient. Draw a horizontal line to represent 


28 Slalislical Analysis in Psychology and Educalion [Cuav. 2 


reaction time in seconds and a vertical line to represent frequencies. Select 
an appropriate scale, both for reaction time and frequencies. In the present 
case if we allow 4% in. for each class interval and zy in. for each unit of fre- 
quency, we obtain a graph roughly 6 in. long and 4 in. tall. The scale is 
arbitrary. The scale suggested in this case, however, results in a graph of 
convenient size. The mid-points of the interval are written along the 
horizontal base line, and the frequency scale along the vertical. For each 
class interval the corresponding frequency is plotted and a horizontal line 
drawn the full length of the interval. To complete the graph we may join 


0 
125.145 .165 .185 .205 .225 245 265 .285 .305 .325 .345 .365 
Reaction time (seconds) 


Fic. 2.1. Histogram for data of Table 2.7. Auditory reaction times for 188 students. 


the ends of these lines to the corresponding ends of the intervals on the 
horizontal axis, although practice in this regard varies. Both the horizontal 
and vertical axes must be appropriately labeled. A concise statement of 
what the graph is about should accompany it. Observe that the width of 
each bar corresponds to the exact limits of the interval. Observe also that in 


this type of graph the frequencies are represented as equally distributed over 
the whole range of the interval. 


2.10. Frequency Polygons 


In a histogram we assume that all the cases within a class interval are 
uniformly distributed over the range of the interval. Ina frequency polygon 
we assume that all cases in each interval are concentrated at the mid-point of 
the interval. In this fact resides the essential difference between a histogram 


Sec. 2.10] Frequency Distributions and Graphic Represenlalion 29 


and a frequency polygon. Instead of drawing a horizontal line the full 
length of the interval, as in the histogram, we make a dot above the mid-point 
of each interval at a height proportional to the frequency. It is customary 
to show an additional interval at each end of the horizontal scale and to join 
these dots to the dots of the adjacent interval. A frequency distribution 
based on the same data as the histogram in Fig. 2.1 is shown in Fig. 2.2. 
Observe that the frequency distribution in Fig. 2.2 is not a smooth con- 
tinuous curve, since the lines joining the various points are straight lines. If 
we subdivide our intervals into smaller intervals, we shall of course obtain 


80 


70 


Frequency 
A 
eo 


10 


о 
125.145 .165 .185 .205 .225 .245 .265 .285 .305 .325 .345 .365 
Reaction time (seconds) 


Fic. 2.2. Frequency polygon for data of Table 2.7. Auditory reaction times for 188 
students. 


irregular frequencies, there being too few members in each interval. Con- 
sider, however, a circumstance where our intervals become smaller and 
smaller and at the same time the total number of cases becomes larger and 
larger. If we carry this process to the extreme situation where we have an 
indefinitely small interval and an indefinitely large number of cases, we arrive 
at the concept of a continuous frequency distribution. 


2.11. Cumulative Frequency Polygons 


The drawing of a cumulative frequency polygon differs from that of a 
frequency polygon in two respects. First, instead of plotting points cor- 
responding to frequencies, we plot points corresponding to cumulative fre- 
quencies. Second, instead of plotting points above the mid-point of each 


30 Statistical Analysis in Psychology and Education [СнаР. 2 


interval, we plot our points above the top of the exact limits of the interval. 
This is done because we wish our graph to visually represent the number of 
cases falling above or below particular values. In plotting the cumulative 
frequency distribution shown in Table 2.7, we would plot the cumulative 
frequency 188 against the top of the exact upper limit of the interval, that is, 
„355, the frequency 186 against .335, and so on. Figure 2.3 shows the cumu- 
lative frequency distribution for the data appearing in the last column of 
Table 2.7. 


200 T 


175 


25 


0 
125 .145 .165 .185 .205 .225 245 265 285 305 325 345 :365 
Reaction time (seconds) 


Fic. 2.3. Cumulative frequency polygon for data of Table 2.7. Auditory reaction times for 
188 students, 


We may convert our raw frequencies to percentages such that all the 
frequencies added together add up to 100 instead of to the number of cases. 
We may then determine the cumulative percentage frequencies. We may 
then graph these frequencies and obtain thereby а cumulative percentage 
polygon, ог ogive. The advantage of this type of diagram is that from it we 
can read off directly the percentage of observations greater than or less than 
any specified value. 


2.12. Some Conventions for the Construction of Graphs 


1. In the graphing of frequency distributions it is customary to let the 
horizontal axis represent scores and the vertical axis frequencies, 

2. The arrangement of the graph should proceed from left to right. The 
low numbers on the horizontal scale should be on the left, and the low numbers 
on the vertical scale should be toward the bottom. 


Sec. 2.13] Frequency Distributions and Graphic Representation 31 


3. The distance along either axis selected to serve as a unit is arbitrary 
and affects the appearance of the graph. Some writers suggest that the 
units should be selected such that the ratio of height to length is roughly 
3:5. This procedure seems to have some aesthetic advantages. 

4. Whenever possible the vertical scale should be so selected that a zero 
point falls at the point of intersection of the axes. With some data this 
procedure may give rise to a most unusual looking graph. In such cases it is 
customary to designate the point of intersection as the zero point and make a 
small break in the vertical axis. 

5. Both the horizontal and vertical axes should be appropriately labeled. 

6. Every graph should be assigned a descriptive title which states precisely 
what it is about. 


2.13. How Frequency Distributions Differ 


Comparison of a number of frequency distributions represented in either 
tabular or graphic form indicates that they differ one from another. An 
important problem in statistics is the identification and definition of prop- 
erties or attributes of frequency distributions which describe how they differ. 
It is customary to designate four important properties of frequency dis- 
tributions. These are central location, variation, skewness, and kurtosis. 
These properties may be viewed either as descriptive of the frequency dis- 
tribution itself or as descriptive of the set of observations of which the 
distribution is comprised. These alternatives are in effect synonymous. A 
frequency distribution is a particular kind of arrangement of a set of observa- 
tions. Central location, variation, skewness, and kurtosis may be discussed 
either with direct reference to sets of observations or with reference to the 
observations arranged in frequency-distribution form. 

Central location refers to a value of the variable near the center of the 
frequency distribution. It is a middle point. Measures of central loca- 
tion are called averages. These are discussed in detail in Chap. 3 of this 
book. 

Variation refers to the extent of the clustering about a central value. If 
all the observations are close to the central value, their variation will be less 
than if they tend to depart more markedly from the central value. Measures 
of variation are discussed in Chap. 4. 

Skewness refers to the symmetry or asymmetry of the frequency distribu- 
tion. If a distribution is asymmetrical and the larger frequencies tend to be 
concentrated toward the low end of the variable and the smaller frequencies 
toward the high end, it is said to be positively skewed. If the opposite holds, 
the larger frequencies being concentrated toward the high end of the variable 
and the smaller frequencies toward the low end, the distribution is said to be 
negatively skewed. 


8@1 8@1 81 821 801 821 cs 821 821 N 
[4 [4 or og S 91 S £ 6-0 
t 9 Sc 0c or 9I +I 8 61-01 
S 0t ot 01 S£ 91 0c £I 62-02 
L SI oz t t 9 SC or 6£-0€ 
0I 0c SI t FI 91 Sc OF 6F-0F 
oz OF or ol se 91 0c 1 65-05 
oe sc 9 0c о 9r tI 8 69-09 
os or [4 or S 91 S £ 61-01 
padeys-f "тка pesa padeqs-] | jepowig ie[nZuvioaw | опту | эпипхозйәт | Bans. 3 D 
(on @ (8) W (9) ©) p ре B= 29 Jr 


S3dVHS INSN3A4JI([ dO SNOLLOSIHISI([ AON32038, ONILVALSATI] VIVA TVOLLS3HLOdAH 
87 TIWL 


32 


Sec. 2.14] Frequency Distributions and Graphic Representation 33 


Kurtosis refers to the flatness or peakedness of one distribution in relation 
to another. If one distribution is more peaked than another, it may be 
spoken of as more /eptokurtic. If it is less peaked, it is said to be more 
platykurtic. It is conventional to speak of a distribution as leptokurtic if it 
is more peaked than a particular type of distribution known as the normal 
distribution, and platykurtic if it is less peaked. The normal distribution is 
spoken of as mesokurtic, which means that it falls between leptokurtic and 
platykurtic distributions. 

Table 2.8 presents hypothetical data illustrating frequency distributions 
with different properties. The distribution in col. 2 is a symmetrical binomial, 
a type of distribution which is of much importance in statistical work and will 
be considered in detail in a later chapter. The distribution in col. 3 has 
central frequencies which are greater than those for the binomial. It is more 
peaked than the binomial, and as far as kurtosis is concerned, it can be said 
to be leptokurtic. The distribution in col. 4 has smaller central frequencies 
than the binomial and larger frequencies toward the extremities. It can be 
spoken of as platykurtic. The distribution in col. 5 has uniform frequencies 
over all class intervals and is described as rectangular. The distribution in 
col. 6 has two humps, or modes. It is said to be bimodal. In the distribu- 
tion in col. 7 the largest frequencies occur at the extremities whereas the 
central frequencies are the smallest. Such a distribution is said to be 
U-shaped. The distributions in cols. 2 to 7 are all symmetrical and have the 
same measures of central location although they differ in variation. Column 
8 illustrates a positively skewed and col. 9 a negatively skewed distribution. 
Extreme skewness leads to the type of distribution shown in col. 10, which is 
described as J-shaped. 


2.14. The Properties of Frequency Distributions Represented 
Graphically 


The differing characteristics of frequency distributions can be readily 
represented in graphical form. Consider the three distributions in Fig. 2.4. 
These distributions appear identical in shape. They are markedly different, 
however, in terms of the central values about which the observations in each 
distribution appear to concentrate; that is, they have different averages 
although they may be identical in all other respects. Distribution А has a 
lower average than B and B than C. 

Now consider the distribution in Fig. 2.5. Inspection of these three 
distributions suggests that while the observations in each case appear to 
concentrate about the same average, they are nonetheless markedly different 
one from another. In the case of distribution A the observations appear to 
be more closely concentrated about the average than in the case of B, and 
the same applies to B in relation to C. Thus these distributions differ in 


34 Statistical Analysis in Psychology and Educalion — (Cuv. 2 


variation. The observations in А are less variable than the observations in 
B, and those in B are less variable than those in C. 

Examine now the distributions in Fig. 2.6. These three distributions have 
different averages and possibly different measures of variation. They differ 
also in skewness. Distribution B is symmetrical about the average; that is, 


Frequency 


Score 
Fic. 2.4. Three frequency distributions identical in shape but with different averages. 


Score 
Fi, 2.5. Three frequency distributions with the same average but with different variation. 


if we were to fold it over about the average, we should find that it had the 
same shape on both sides. A and C are asymmetrical, the shape to the left 
of the average being different from the shape to the right. Distribution A is 
positively skewed, the longer tail extending toward the high end of the scale. 


Distribution C is negatively skewed, the longer tail extending towa 
end of the scale. E Е 


Sec. 2.14] Frequency Distributions and Graphic Representation 35 


Consider now the graphical representation of kurtosis as shown in Fig. 2.7. 
Distribution A is a symmetrical bell-shaped distribution known as the normal 
distribution. Distribution B is observed to be flatter on top than the normal 
distribution and is referred to as platykurtic, while distribution C is more 
peaked than the normal and is spoken of as leptokurtic. 


A B C 

| 

Ы 
ш. 

Score 
Fic. 2.6. Three frequency distributions differing in skewness. 
[^] 

In / \ 
с 

El 

g 

u 


Score 
Fic. 2.7. Three frequency distributions differing in kurtosis. 


In the above discussion the meaning which attaches to the descriptive 
properties of collections of measurements arranged in frequency distributions 
is largely intuitive and is derived from the inspection of distributions in 
tabular or graphic form. To proceed with the study of data interpretation 
we require precisely defined numerical measures of central location, variation, 
skewness, and kurtosis. Chapters 3 and 4 to follow are concerned with the 


36 


Statistical Analysis in Psychology and Educalion [Cuar. 2 


more precise and formal delineation of these properties, their numerical 
description, and calculation. 


1. 


© (л > ш 


EXERCISES 


In preparing frequency distributions for the following data write down an acceptable 
set of class intervals, the exact limits of the intervals, and their mid-points: (a) error 
scores ranging from 24 to 87 made by a sample of rats in running a maze; (b) intelligence 
quotients ranging from 96 to 137 for a group of school children; (c) scholastic-aptitude- 
test scores ranging from 227 to 896 obtained by a group of university students; (d) 
response latencies ranging from .276 to .802 sec for а group of experimental subjects; 
(е) supervisors’ ratings from 0 to 9 for a group of industrial employees. 


- The following are marks obtained by a group of 40 university students on an English 


examination: 


42 GS "35" JS "GE 03 A 73 te 
96 30 552 76 FONT "0 
83--9452...,85./ 799 69. ^56 Ар 36 
52. /65. 49 80 67 SO 88 8 
"o4 71 722 87 9 82 89 79 


Prepare a frequency distribution and a cumulative frequency distribution for these data 
using a class interval of 5. 

Prepare a histogram for the data in Exercise 2 above. 

Prepare a cumulative frequency polygon for the data in Exercise 2 above. 

"Toss 10 coins 100 times and make a distribution of the frequency of heads, 

Frequency distributions of intelligence quotients are available for (а) a random sample 
of the population at large, and (b) a random sample of university students, In what 
ways and for what reasons might you expect these two distributions to differ? 


CHAPTER 3 


AVERAGES 


3.1. Introduction 


Measures of central location used in the description of frequency distribu- 
tions are called averages. In common usage the word "average" is often 
employed to refer to a value obtained by adding together a set of measure- 
ments and then dividing by the number of measurements in the set. This is 
one type of average only and is called the arithmetic mean. In general, an 
average is a central reference value which is usually fairly close to the point 
of greatest concentration of the measurements and may in some sense be 
thought to typify the whole set. Any particular measurement may be 
viewed as a certain distance above or below the average. Averages in 
common use are the arithmetic mean, median, mode, geometric mean, and 
harmonic mean. The most important and widely used of these is the 
arithmetic mean. 


3.2. The Arithmetic Mean 


By definition the arithmetic mean is the sum of a set of measurements 
divided by the number of measurements in the set. Consider the following 
measurements: 7, 13, 22, 9, 11, 4. The sum of these six measurements is 66. 
The arithmetic mean is therefore 66 divided by 6, or 11. 

In general, if № measurements are represented by the symbols Xi, Xe, 
Xs, . . . , Ху, the arithmetic mean in algebraic language is as follows: 


X 


_Х+ X:+ n OE aS NES (3.1) 


The symbol X, spoken of as X bar, is used to denote the arithmetic mean of 
N 


the values of X. » the Greek letter sigma, describes the operation of 
imi 
summing the N measurements. The summation extends from į = 1 to 
і = М. 
37 


38 Slalislical Analysis in Psychology and Educalion [Снлр.3 
3.3. The Weighted Arithmetic Mean 


Consider a situation where different values of X occur more than once. 
The arithmetic mean is then obtained by multiplying each value of X by the 
frequency of its occurrence, adding together these products, and then dividing 
by the total number of measurements. Consider the following measure- 
ments: 11, 11, 12, 12, 12, 13, 13, 13, 13, 13, 14, 14, 15, 15, 15, 16, 16, 17, 17, 18. 
The value 11 occurs with a frequency of 2, 12 with a frequency of 3, 13 with a 
frequency of 5, and so оп. These data may be written as follows: 


Xi f АХ; 
18 1 18 
17 2 34 
16 2 32 
15 3 45 
14 2 28 
13 5 65 
12 3 36 
1 2 22 
Total..| 20 280 


This is a frequency distribution with a class interval of 1. The symbol fi 
is used to denote the frequency of occurrence of the particular value X;. 
Multiplying each value X; by the frequency of its occurrence and adding 
together the products f;X;, we obtain the sum 280. The arithmetic mean is 
then 280 divided by 20, or 14.0. 

In general, where Xs, Xs, Xs... ; Xx occur with frequencies fı, f» 
Sa, ۰ ۰. fey where k is the number of different values of X, the arithmetic 
mean 

k 


DEL? 
die: Xt faks fC ль == (3.2) 


Observe that here the summation is over Ё terms, the number of different 
N k 


values of the variable X. Observe also that b Х; = M SX; The mean 
і ie 

X obtained in this way is sometimes called the weighted arithmelic mean, the 

idea here being that each value of X is weighted by the frequency of its 


occurrence, 
3.4. Calculating the Mean from Frequency Distributions 


Consideration of the weighted arithmetic mean suggests a simple method 
for calculating the mean from data grouped in the form of a frequency dis- 


Sec. 3.5] Averages 39 


tribution regardless of the size of the class interval. The mid-point of the 
interval may be used to represent all values falling within the interval. 
We assume that the variable X takes values corresponding to the mid-points 
of the intervals, and these are weighted by the frequencies. We multiply 
the mid-points of the intervals by the frequencies, sum these products, and 
divide this sum by N to obtain the mean. Моге explicitly; the steps involved 
are as follows. First, calculate the mid-points of all intervals. Second, 
multiply each mid-point by the corresponding frequency. Third, sum the 
products of mid-points by frequencies. Fourth, divide this sum by N to 
obtain the mean. To illustrate, consider Table 3.1. 


TABLE 3.1 
CALCULATING THE MEAN FOR DISTRIBUTION OF Test Scores—Lonc METHOD 

(1) (2) (3) (4) 

Class | Mid-point | Frequency | Frequency X mid-point 
interval Xi fi fiX; 
45-49 47 1 47 
40 44 42 2 84 
35-39 37 3 111 
30-34 32 6 192 
25-29 27 8 216 
20-24 22 17 374 
15-19 17 26 442 
10-14 12 11 132 
5-9 Dow 2 14 
0-4 70 gi 0 0 
Total | 76 1,612 


k 
X АХ; = 1,612 = 1,612/76 = 21.21 
i=l 
The mid-points of the intervals X; appear in col. 2. The frequencies fi 
appear in col. 3. The products of the mid-points by the frequencies fiX; 


are shown in col. 4. The sum of these products у АХ; is 1,612, N is 76, 


i=l 
and the mean is obtained by dividing 1,612 by 76 and is 21.21. 


3.5. Change of Origin and Unit 


A series of measurements may be conceptualized as points on a line meas- 
ured in appropriate units from an origin or zero point. Thus particular 
measurements, say, 48 or 72 in., may be regarded as points 48 and 72 units, 
respectively, from a zero origin, the unit of measurement here being the inch. 
It is frequently useful in statistical work to change the origin and to represent 


40 Statistical Analysis in Psychology and Education [CHar. 3 


the measurements as deviations from a new origin. The new origin may be 
chosen arbitrarily, or it may be the arithmetic mean. Consider the measure- 
ments 7, 13, 22, 9, 11, and 4. Select an arbitrary origin, say, 9. The meas- 
urements represented as deviations from this origin become —2, 4, 13, 0, 2, 
and —5. The measurements represented as deviations from the arithmetic 
mean, in this case 11, are —4, 2, 11, —2, 0, and — 7. 

Algebraically, a deviation from any arbitrary origin may be represented by 


х= Xo— Xo 


where xj is a deviation of the measurement X, from an arbitrary origin Xo. 
A deviation from the arithmetic mean may be represented by 


= Xi — 2 


where x; is a deviation of the measurement X; from the mean X. The symbol 
xi will be used frequently in this book to refer to a deviation from the arith- 
metic mean. Both the above expressions are simple transformations of the 
measurements involving a change in origin. 

Situations arise where a change in unit is involved. To illustrate, we may 
convert inches to feet by dividing by 12, or ounces to pounds by dividing by 
16. This is a simple change in the unit of measurement. On occasion both 
a change in unit and a change in origin are required. The deviations of the 
measurements 7, 13, 22, 9, 11, 4 about the arbitrary origin 9 are — 2, 4, 13, 
0, 2, —5. If these deviations are now divided by any number, say, 2, a 
change in unit results and the deviations become — 1, 2, 6.5, 0, 1, and — 2.5. 
In this case the unit of measurement is twice as large as it was before. 

A deviation from any arbitrary origin with a change in unit may be 
written as 

‚_ Xi- Xo 
d e 
where h is the new unit. This expression may be spoken of as a transforma- 
tion involving both a change in origin and a change in unit. 


3.6. A Short Method of Calculating the Mean from Frequency 
Distributions 


А change of origin and unit may be used to reduce the arithmetical labor in 
calculating the mean from data grouped in frequency-distribution form. 
This method is illustrated with reference to Table 3.2, 

In col. 2 the frequencies are recorded. First, select the mid-point of any 
class interval as an arbitrary origin, or assumed mean. The selection of an 
arbitrary origin near the middle of the distribution simplifies the arithmetic. 


Sec. 3.6] Averages 41 


In the present example the arbitrary origin is taken as the mid-point of the 
interval 20 to 24, which is 22, and 0 is written opposite that interval in col. 3. 
The mid-point of the interval 25 to 29 is one unit of class interval above the 
arbitrary origin, and 1 is written opposite this interval. The mid-point of 
the next interval, 30 to 34, is two units of class interval above the arbitrary 
origin; hence 2 is written opposite this interval. The procedure simply 
amounts to writing +1, +2, +3, and so on, opposite the intervals above the 


TABLE 3.2 
CALCULATING THE MEAN ғов DISTRIBUTION or TEST SCORES—SHORT METHOD 


(1) (2) (3) (4) (5) (6) 
Computation келү y New computation Euan ыу 
Class Frequency : computation 5 new computa- 
d variable ч variable m { 
interval f Й variable " tion variable 
x; , х; ” 
Fix; Six; 
45-49 1 5 5 6 6 
40-44 2 4 8 5 10 
35-39 3 3 9 4 12 
30-34 6 2 12 3 18 
25-29 8 1 8 2 16 
20-24 17 0 0 1 17 
15-19 26 -1 —26 0 0 
10-14 11 -2 —22 -1 —11 
5-9 T д —6 a -4 
oci ^| Popes a ik n (0-8 0 
Total...... mul eye | —12 | xs 64 
ا‎ _ SL ال‎ — 


У л 2-12 Š = 22 + 5(—12/76) = 2121 


LI Check 
fo = 64 X = 17 + 5(64/70) = 21.21 


abritrary origin, and — 1, —2, — 3, and so on, opposite the intervals below the 
arbitrary origin. These numbers, which appear in col. 3, are referred to as 
the computation variable and are represented by the symbol x; They are 
the deviations of the mid-points of the class intervals from an arbitrary 
origin in units of class interval. Second, multiply the frequencies by the 
computation variable with due regard to sign as shown in col. 4. , Third, 


add col. 4 to obtain 7 fac, the sum of deviations about the arbitrary origin 
i= 

in units of class interval. In the present example this sum is — 12. Fourth, 

divide this sum by У and multiply the result by л, the class interval. Here 


42 Slalislical Analysis in Psychology and Education [Cuar. 3 


we divide — 12 by 76 and multiply the result by 5 to obtain —.79. The 
fifth step involves the addition of the quantity thus obtained, —.79, to the 
arbitrary origin 22 to obtain the mean. The mean is then 21.21. 

Let us summarize the steps involved: 

1. Select an arbitrary origin and write down the computation variable. 

2. Multiply the frequencies by the computation variable. 

3. Sum these products with due regard to sign. 

4. Divide this sum by № and multiply by k, the class interval. 

5. Add the result to the arbitrary origin to obtain the arithmetic mean. 

This procedure is ordinarily accomplished by the application of a simple 
formula: 

k 


У Ја 
R= Хоф (3.3) 


where X, = arbitrary origin 
Ji = frequencies 
x; = computation variable 
N = number of cases 
h — class interval 
In the present example 
—12 
To check the calculation, select a new arbitrary origin as shown in col. 5 of 
Table 3.2 and repeat the calculation. Note that the difference between 
k k 


у Ја; апа 7 fat in Table 3.2 is equal to N. This provides a check on the 
m i 


calculation thus far and will hold where the two arbitrary origins are one 
class interval removed from each other. 


3.7. The Machine Calculation of the Mean 


With the widespread use of modern calculating machines many workers 
prefer to compute the mean by adding the measurements directly and divid- 
ing by N, unless the number of measurements is large. Where the number of 
measurements is large it may sometimes be more convenient to group the 
data in the form of a frequency distribution and to calculate the mean by the 
short method outlined above. 

Methods of computation can be developed to suit the capabilities of dif- 
ferent types of machines. In computing the mean from grouped data using 
a calculating machine the following procedure will prove convenient : 

1. Select as an arbitrary origin the mid-point of the lowest class interval. 


Sec. 3.8] Averages 43 


2. Write down the computation variable 0, 1, 2, 3, and so on, opposite the 
frequencies. 

3. Obtain the sum of products directly on the machine. This is done by 
multiplying the frequencies by the computation variable and allowing the 


machine to accumulate the products. The quantity thus obtained is у Тл. 
m 
4. Use this quantity to compute the mean from the formula in the usual 
way. 
As a check we may select the mid-point of the highest class interval as the 
arbitrary origin and repeat the calculation. With a little practice in the 
k 


use of this method step 2 may be omitted and we may obtain у Ја; directly 
i=1 
from the machine without the necessity of writing down any numbers at all. 


3.8. The Mean of Combined Groups 


Consider a group of эи measurements with mean X, and a set of из measure- 
ments with mean Xs. Denote the ith measurement in the first group by the 
symbol X; and the ith measurement in the second group by the symbol Xj». 
The first subscript identifies the particular measurement. The second 
subscript identifies the group. Thus X72 would in this notation identify 
the seventh measurement in the second group of measurements. Let 
nı + ns = N, the total number of measurements in the two groups. The 
mean of all the measurements in the two groups taken together is 


ni n: 


Xat Xi 
Sai к i d. à e mXi +n: (3 4) 
ny + Ne N 1 


To illustrate, the mean of the four measurements 1, 3, 8, and 8 is 5. The 
mean of the six measurements 4, 4, 5, 6, 8, and 15 is 7. The mean of all ten 
measurements taken together is then 


4X54+6X7_ 
Х= 
The above result may be extended to apply to any number of groups. With 
more than two groups, say, #, we simply multiply the number of cases in each 
group by the group mean, sum the Ё products thus obtained, and divide by 
N, the total number of measurements in the Ё groups. Thus with & groups 
k 
yn 


X= = (3.5) 


M Statistical Analysis in Psychology and Education [СнаАР. З 


3.9. Some Properties of the Arithmetic Mean 


The sum of the deviations of all the measurements in a set from their arithmetic 
mean is zero. The arithmetic mean of the measurements 7, 13, 22, 9, 11, and 
4is 11. The deviations of these measurements from this mean are —4, 2, 11, 
— 2, 0, апі —7. The sum of these deviations is zero. 

Proof of this result is as follows: 


N 
b (X — X) 
i=l 


ll 
th: 
= 
| 
* 


ll 
- 
>< 
| 
= 
> 


N N 
Observe that since X = ( JY x)/1 it follows that Y X = NX, Also 
ici {= 
adding X, the mean, times is the same as multiplying X by N; thus if X is 11 
and N is 6, we observe that 11 + 11 + 11+ 11 + 11 + 11 = 6 X 11 = 66. 

The sum of squares of deviations about the arithmetic mean is less than the sum 
of the squares of deviations about any other value. The deviations of the 
measurements 7, 13, 22, 9, 11, 4 from the mean 11 are —4, 2, 11, —2, 0, — 7. 
The squares of these deviations are 16, 4, 121, 4, 0, 49. The sum of squares is 
194. Had any other origin been selected, the sum of squares of deviations 
would be greater than the sum of squares about the mean. Select a different 
origin, say, 13. The deviations are —6, 0, 9, 4, 2, —9. Squaring these we 
have 36, 0, 81, 16, 4, 81. The sum of these squares is 218, which is greater 
than the sum of squares about the mean. Selection of any other origin will 
demonstrate the same result. 

This property of the mean indicates that it is the centroid, or center of 
gravity, of the set of measurements. Indeed, the mean is the central value 
about which the sum of squares of deviations is a minimum. This result 
may be readily demonstrated. Consider deviations from an origin X + c, 
where ¢ #0. A deviation of an observation from this origin is 


Xi— (+) = (X; — X) +c (3.6) 
Squaring and summing over № observations we obtain 
N 


bj X: - (+) = Ў (X; — X) D ce 2с D (X,— X) (3.7) 
ie e 


i=l 


Because the sum of deviations about the mean is zero, the third term to the 
right is zero. Also c? summed N times is Nc?, and we write 


N N 
2 X- += Y (х,- 3 ма (3.8) 


t=1 i=l 


Sec. 3.10] Averages 45 


This expression states that the sum of squares of deviations about an origin 
X -F c may be viewed as comprised of two parts, the sum of squares of devia- 
tions about the mean Х and Nc*. The quantity Nc? is always positive. 
Hence the sum of squares of deviations about an origin X + c will always be 
greater than the sum of squares about X. Thus the sum of squares of 
deviations about the arithmetic mean is less than the sum of squares of 
deviations about any other value. The sum of squares about the mean is a 
minimum. 

Any mean calculated on a random sample of size N is an estimate of a 
population mean, which is the value obtained where it is possible to measure 
all members of the population. The mean has the property that for most 
distributions it is a better, or more accurate, or more efficient, estimate of the 
population mean than other measures of central location such as the median 
and the mode. This is one reason why it is most frequently used. Proof of 
this result is beyond the scope of this book. 

Reference has been made to a number of properties of the arithmetic mean. 
What importance attaches to these properties, or why should they be dis- 
cussed? The fact that the sum of deviations about the mean is zero greatly 
simplifies many forms of algebraic manipulation. Any term involving the 
sum of deviations about the mean will vanish. The fact that the sum of 
squares of deviations about the mean is a minimum in effect implies an 
alternative definition of the mean; namely, the mean is that measure of 
central location about which the sum of the squares is a minimum. In effect, 
the mean is a measure of central location in the least-squares sense. The 
method of least squares is of considerable importance in statistics and is used, 
for example, in the fitting of lines and curves. The mean may be regarded 
as a point located by the method of least squares. The properties pertaining 
to change of origin and change of unit are of importance in that they lead to 
simplified methods of computing the mean where a fairly large number of 
observations is involved. The fact that the sample mean provides a better 
estimate of a population parameter than other measures of central location 
is of primary importance. Throughout statistics we are concerned with the 
problem of making statements about population values from our knowledge 
of sample values. Obviously, the more accurate these statements are, the 


better. 


3.10. The Median 


Another commonly used measure of central location is the median. The 
median is a point оп a scale such that half the observations fall above it and 
half below it. The observations 2, 7, 16, 19, 20, 25, and 27 are arranged in 
order of magnitude. Here N is an odd number and the median is 19; three 
observations fall above it and three below it. If another observation, say, 


46 Statistical Analysis in Psychology and Education [Cuar. 3 


31, is included, the median is then taken as the arithmetic mean of the two 
middle values 19 and 20; that is, the median is (19 + 20)/2, or 19.5. Con- 
sider a situation where certain values of the variable occur more than once, as, 
for instance, with the observations 7, 7, 7, 8, 8, 8, 9, 9, 10, 10. The three 8's 
are assumed to occupy the interval 7.5 to 8.5. The median is obtained by 
interpolation. In this instance we must interpolate two-thirds of the way 
into the interval to obtain a point above and below which half the observa- 
tions fall. The median is then taken as 7.5 + 0.66 = 8.16. 

With a frequency distribution represented in graphical form, the ordinate 
at the median divides the total area under the curve into two equal parts. 


3.11. Calculating the Median from Frequency Distributions 


In calculating the median from data grouped in the form of a frequency 
distribution the problem is to determine a value of the variable such that 
one-half the observations fall above this value and the other half below. 
The method will be illustrated with reference to the data in Table 3.3. 


TABLE 3.3 
FREQUENCY DISTRIBUTION oF PSYCHOLOGICAL TEST SCORES 
(1) (2) (3) 
Class interval | Frequency | Cumulative frequency 
45-49 1 76 
40-44 2 78 
35-39 3 73 
30-34 6 70 
25-29 8 64 
20-24 17 56 
15-19 26 39 
10-14 11 13 
5-9 2 2 
0-4 |2: 0 0 
Total.........| 76 


First, record the cumulative frequencies as shown in col. 3. Second, 
determine № /2, one-half the number of cases, in this example 38. Third, 
find the class interval in which the 38th case, the middle case, falls. The 
38th case falls within the interval 15 to 19, and the exact limits of this interval 
are 14.5 and 19.5. Clearly, the 38th case falls very close to the top of this 
interval because we know from an examination of our cumulative frequencies 
that 39 cases fall below the top of this interval, that is, below 19.5, Fourth, 
interpolate between the exact limits of the interval to find a value above and 
below which 38 cases fall. To interpolate, observe that 26 cases fall within 


Sec. 3.12] Averages 47 


the limits 14.5 and 19.5, and we assume that these 26 cases are uniformly 
distributed in rectangular fashion between these exact limits. Now to arrive 
at the 38th, or middle, case, we require 25 of the 26 cases within this interval, 
because 2 + 11 + 25 = 38. This means that we must find a point between 
14.5 and 19.5 such that 25 cases fall below and 1 case above this point. The 
proportion of the interval we require is 33, which is 3$ X 5 units of score, or 
4.81. We add this to the lower limit of the interval to obtain the median, 
which is 14.50 + 4.81, or 19.31. 

Let us summarize the steps involved: 

1. Compute the cumulative frequencies. 

2. Determine №/2, one-half the number of cases. 

3. Find the class interval in which the middle case falls, and determine the 
exact limits of this interval. 

4. Interpolate to find a value on the scale above and below which one-half 
the total number of cases falls. This is the median. 

For the student who has difficulty in following the above a simple formula 
may be employed. 
N/2 -F 

fa 


where L = exact lower limit of interval containing the median 
F = sum of all frequencies below /, 
Jm = frequency of interval containing median 
N = number of cases 
h = class interval 
In the present example L = 14.5, F = 13, /„ = 26, N = 76,andh=5. We 
then have 


Median = L + h (3.9) 


=й 


Median = 14.5 + “> 3x 5- 1931 


3.12. The Mode 


Another measure of central location is the mode. In situations where 
different values of X occur more than once the mode is the most frequently 
occurring value. Consider the observations 11, 11, 12, 12, 12, 13, 13, 13, 13, 
13, 14, 14, 14, 15, 15, 15, 16, 16, 17, 17, 18. Here the value 13 occurs 5 times, 
more frequently than any other value; hence the mode is 13. 

In situations where all values of X occur with equal frequency, where that 
frequency may be equal to or greater than 1, no modal value can be calculated. 
Thus for the set of observations 2, 7, 16, 19, 20, 25, and 27 no mode can be 
obtained. Similarly, the observations 2, 2, 2, 7, 7, 7, 16, 16, 16, 19, 19, 19, 
20, 20, 20, 25, 25, 25, 27, 27, 27 do not permit the calculation of a modal 
value. All values occur with a frequency of 3. 


48 Slalislical Analysis in Psychology and Education (Снр. 3 


In the case where two adjacent values of X occur with the same frequency, 
which is larger than the frequency of occurrence of other values of X, the 
mode may be taken rather arbitrarily as the mean of the two adjacent values 
of X. Consider the observations 11, 11, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 
14, 15, 15, 16, 16, 17, 18. Here the values 13 and 14 both occur with a 
frequency of 4, which is greater than the frequency of occurrence of the 
remaining values. The mode may be taken as (13 + 14)/2, or 13.5. 

Where two nonadjacent values of X occur such that the frequencies of both 
are greater than the frequencies in adjacent intervals, then each value of 
X may be taken as a mode and the set of observations may be spoken of as 
bimodal. Consider the observation 11, 11, 12, 12, 12, 13, 13, 13, 13, 13, 14, 
14, 14, 15, 15, 15, 15, 16, 16, 16, 17, 17, 18. Here the value 13 occurs five 
times, and this is greater than the frequency of occurrence of the adjacent 
values. Also 15 occurs four times, and this is also greater than the frequency 
of occurrence of the adjacent values. This set of observations may be said 
to be bimodal. 

With data grouped in the form of a frequency distribution the mode is 
taken as the mid-point of the class interval with the largest frequency. 

The mode is a statistic of limited practical value. It does not lend itself 
readily to algebraic manipulation. It has little meaning unless the number 
of measurements under consideration is fairly large. 


3.13. Comparison of the Mean, Median, and Mode 


If we represent a frequency distribution graphically, the mean is a point on 
the horizontal axis which corresponds to the centroid, or center of gravity, of 
the distribution. If a cutout of the distribution were made from heavy 
cardboard and balanced on a knife edge, the point of balance would be the 
mean. The median is a point on the horizontal axis where the ordinate 
divides the total area under the curve into two equal parts. Half the area 
falls to the left and half to the right of the ordinate at the median. The mode 
is a point on the horizontal axis which corresponds to the highest point of the 
curve. 

Where the frequency distribution is symmetrical, the mean, median, and 
mode coincide. Where the frequency distribution is skewed, these three 
measures do not coincide. Figure 3.1 shows the mean, median, and mode for 
a positively skewed frequency distribution. 

We note that the mean is greater than the median, which in turn is greater 
than the mode. Where the distribution is negatively skewed the reverse 
relation holds. 

In most situations where a measure of central location is required the 
arithmetic mean is to be preferred to either the median or the mode. It is 
rigorously defined, easily calculated, and more amenable to algebraic treat- 


Sec. 3.14] Averages 49 


ment. It also provides a better estimate of the corresponding population 
parameter. 

The median is, however, to be preferred in certain situations. On occasion 
observations occur which appear to be atypical of the remaining observations 
in the set. Such observations may greatly affect the value of the mean. 
Under these circumstances the median is the more appropriate measure. 
Consider the observations 2, 3, 3, 4, 7, 9, 10, 11, 86. The observation 86 is 
atypical of the remaining observations, and its presence greatly affects the 
value of the mean. The mean is 15, a value greater than eight of the nine 
observations. The median is 7. Intuitively, in this case, the median 
appears to be the more appropriate measure. Note that in this example the 


Frequency 


Mode f Mean 
Median 


Fic. 3.1. Relation between the mean, median, and mode in a positively skewed frequency 

distribution. 

set of observations is grossly asymmetrical. Inmanysituations where the dis- 

tribution of the variable shows gross asymmetry the median is to be preferred. 
The mode is the appropriate statistic where the most frequently occurring 

value is required. It is rarely used. 


3.14. Other Measures of Central Location 


Another measure of central location is the geometric mean. The geometric 
mean of two numbers is the square root of their product; of three numbers it 
is the cube root of their product. Thus the geometric mean of 16 and 4 is 
A/16 X 4, or 8. In general, the geometric mean of a set of N values is the 
Nth root of their product. This may be written as 


GM = Xi Xr Xi... XN (3.10) 


The geometric mean cannot be computed when any value of X is zero or 
negative. All values of X must be positive and greater than zero. 


50 Slalislical Analysis in Psychology and Education [Снлр.8 


The geometric mean is the appropriate average where the observations 
are measures of rate of change. It has application to much growth data. 
To illustrate, consider a boom town with a rapidly growing population. The 
population in 1952 is 2,000, in 1953 it is 9,000, and in 1954 it is 18,000. The 
population in 1953 is 4.5 times the 1952 population, and the 1954 population 
is 2 times the 1953 population. The appropriate average for describing the 
average rate of increase over the two-year period is the geometric mean, which 
is 4.5 X 2, or 3. The 1952 population is 2,000, and the 1954 population 
is 3 times 3 this figure, or 18,000. The arithmetic mean in this case is 
(4.5 + 2)/2, or 3.25. Were we to state that the mean rate of change over 
the two-year period was 3.25, and not 3, we should be led to the observation 
that the population in 1954 was 3.25 times 3.25, or 10.56 times the 1952 
population, an observation which quite obviously does not correspond to 
the facts. 

Another type of average is the harmonic mean, which is given by the 
formulation 


B 1 N 
TEFA E EU 

Ee) Be 
i=l 


This expression states that the harmonic mean is the reciprocal of the arith- 
metic mean of the reciprocals of the observations. Again, all values of X 
must be positive and greater than zero. To illustrate, the reciprocals of the 
observations 2, 5, and 10 are 4, $, and #5. The sum of these reciprocals is 
Êy, or $, and the mean is уу. The harmonic mean is then 4#, or 3.75. 

The harmonic mean is used in averaging ratios. A typist, for example, in 
typing a 400-word letter may type the first 200 words in 20 min and the 
second 200 words in 10 min. The rate for the first 200 words is 10 words per 
minute, and for the second 200 words the rate is 20 words per minute. "These 
rates are ratios of words typed to time. The average typing speed in this 
case is given by the harmonic mean, which is 2/(s'5 + yp), or 13.33 words 
per minute. This result can be readily checked by considering the total 
words typed divided by total time, that is, 444°, or 13.33 words per minute. 
If we take the arithmetic mean of the two rates, (20 + 10)/2, or 15, a spurious 
result is obtained, since it is obvious in this case that total words typed divided 
by total time is the average required. Note that in this example the words 
typed for the two periods are constant and the times are variable. Had the 
situation been the reverse, the words typed being variable and the times 
constant, the arithmetic mean would be the appropriate average. 

Both the geometric mean and the harmonic mean have limited and rather 
specialized applications in psychological work, and detailed consideration of 
these applications need not detain us. 


HM 


(3.11) 


CHAP. 3] Averages 51 


EXERCISES 
1. In 100 rolls of a die the frequencies of the six possible events are as follows: 
Xi fi 


- t3 ш ی ھک‎ с 
D 
e 


Compute the weighted arithmetic mean for this distribution. 
2. The following is a frequency distribution of examination marks: 


Class interval fi 


90-94 1 
85-89 4 
80-84 2 
75-79 8 
70-74 9 


- 


65-69 


Compute the arithmetic mean. ч 
3. How does the addition of a constant and multiplication by a constant affect the arith- 


metic mean? 
. For the following data determine the mean of the combined groups: 


(à) mn = 12 X200 ()m-25 X,=10 
n = 8 X, = 40 пз = 75 1, = 50 


> 


. The sum of squares of deviations of 10 observations from a mean of 50 is 225. What is 
the sum of squares of deviations from an arbitrary origin of 60 [Eq. (3.8)]? 


щл 


6. Compute medians for the following data: 
(a) 3, 7, 15, 26, 51 (d) 12, 19, 24, 24, 36, 42 
(b) 3, 9, 22, 25, 31, 46 (e) 4, 4, 5, 5, 6 


(c) 6, 25, 31, 31, 45, 64 
7. Compute modes for the following data: 


(a) 2, 2, 5, 5, 5, 6, 6, 6, 7, 8, 12 
(b) 3, 3, 4, 4, 4, 5, 7, 7, 9, 12 
8. Compute medians and modes for the data in Exercises 1 and 2 above. 
9. What are the geometric and harmonic means of the numbers 1, 2, 3, and 4? 


CHAPTER 4 


MEASURES OF VARIATION, SKEWNESS, 
AND KURTOSIS 


4.1. Introduction 


Of great concern to the statistician is the variation in the events of nature. 
The variation of one measurement from another is a persisting characteristic 
of any sample of measurements. Measurements of intelligence, eye color, 
reaction time, and skin resistance, for example, exhibit variation in any 
sample of individuals. Anthropometric measurements such as height, 
weight, diameter of the skull, length of the forearm, and angular separation 
of the metatarsals show variation between individuals. Anatomical and 
physiological measurements vary; also the measurements made by the 
physicist, chemist, botanist, and agronomist. Statistics has been spoken of 
as the study of variation. Fisher (1948) has observed, “The conception of 
statistics as the study of variation is the natural outcome of viewing the 
subject as the study of populations; for a population of individuals in all 
respects identical is completely described by a description of any one indi- 
vidual, together with the number in the group. The populations which are 
the object of statistical study always display variation in one or more 
respects." The experimental scientist is frequently concerned with the 
different circumstances, conditions, or sources which contribute to the 
variation in the measurements he obtains. The analysis of variance (Chap. 
15) developed by Fisher is an important statistical procedure whereby the 
variation in a set of experimental data can be partitioned into components 
which may be attributed to different causal circumstances. 

How may the variation in any set of measurements be described? Con- 
sider the following measurements for two samples: 


Sample А 10 12 15 18 20 
Sample B 2 8 15 22 28 


We note that the two samples have the same mean, namely, 15. Simple 
inspection indicates, however, that the measurements in sample B are more 
variable than those in sample A ; they differ more one from another. Among 
the possible measures used to describe this variation are the range, the mean 


52 


Sec. 4.3] Measures of Variation, Skewness, and Kurlosis 53 


deviation, and the standard deviation. The most important of these is the 
standard deviation. 


4.2. The Range 


The range is the simplest measure of variation. In any sample of measure- 
ments the range is taken as the difference between the largest and smallest 
measurements. The range for the measurements 10, 12, 15, 18, and 20 is 
20 minus 10, or 10. The range for the measurements 2, 8, 15, 22, and 28 is 
28 minus 2, or 26. The measurements in the second set quite clearly exhibit 
greater variation than those in the first set, and this reflects itself in a much 
greater range. The range has two disadvantages. First, for large samples 
it is an unstable descriptive measure. Consequently it should be used with 
small samples only, preferably 10 or less. The sampling variance of the 
range for small samples is not much greater than that of the standard 
deviation but increases rapidly with increase in N. Second, the range is 
not independent of sample size, except under special circumstances. For 
distributions that taper to zero at the extremities a better chance exists of 
obtaining extreme values for large than for small samples. Consequently, 
ranges calculated on samples composed of different numbers of cases are 
not directly comparable. Despite these disadvantages the range may be 
effectively used in the application of tests of significance with small samples. 
For a discussion of such tests the reader is referred to Fryer (1954) and 
Lindzey (1954, Chap. 8). 


4.3. The Mean Deviation 


Consider the following measurements: 


Sample А 8 8 8 8 8 
Sample B ae i Es} 
Sample C 1 СИУ 2p 29 


Intuitively, the measurements in sample А are less variable than those in B, 
which in turn are less variable than those in С. Indeed, the measurements 
in A exhibit no variation at all. The means of the three samples are 8, 7, 
and 16. If we express the measurements as deviations from their sample 


means, we obtain 


Sample 4 0 0 0 0 0 
Sample В =p -3 0 rt; +6 
Sample С —15 с Up ЖИР г9 35515 


Inspection of these numbers suggests that as variation increases, the depar- 
ture of the observations from their sample mean increases. We may use 


54 Stalislical Analysis in Psychology апа Educalion [Снлр.4 


this characteristic to define a measure of variation. One such measure 
is the mean deviation. The mean deviation is the arithmetic mean of the 
absolute deviations from the arithmetic mean. Ап absolute deviation 
is a deviation without regard to algebraic sign. To obtain the mean 
deviation we simply calculate the deviations from the arithmetic mean, 
sum these, disregarding algebraic sign, and divide by N. For sample 
A above, the mean deviation is zero. For sample B the mean deviation is 
(6+3+0+3+6)/5 = 4$ = 3.6. For sample С the mean deviation 
is (15 + 11 + 4+ 9+ 13)/5 = 57 = 104. 
The mean deviation is given in algebraic language by the formula 


MD = zx — Х| (4.1) 


N 


Here X — X is a deviation from the mean and |X — X| is a deviation with- 
out regard to algebraic sign. The bars mean that signs are ignored. 

Hitherto, symbols above and below the summation sign Z have been used 
to indicate the limits of the summation. In the above formula for the mean 
deviation these symbols have been omitted, the summation being clearly 
understood to extend over the № members іп the sample. In this and sub- 
sequent chapters symbols indicating the limits of summation will, for con- 
venience, be omitted where these are understood clearly from the context 
to extend over N sample members. Where any possibility of doubt could 
exist, the symbols above and below the summation sign will be inserted. 

The mean deviation is infrequently used. It is not readily amenable to 
algebraic manipulation. This circumstance stems from the use of absolute 
values. In general, in statistical work the use of absolute values should be 
avoided, if at all possible. It is of interest to note that the sum of abso- 
lute deviations about the median is a minimum. Consider the numbers 
1, 5, 20, 25, 29. The median is 20. The sum of absolute deviation is 
19 + 154-0 + 5 + 9 = 48. The corresponding sum of deviations about 
any other origin, say, 19, will be greater than 48. "The sum of absolute 
deviations about the origin 19 is 18 + 14 + 1 + 6+ 10 = 49. The median 
could be defined as that value about which the sum of absolute deviations 
is a minimum. 


4.4. The Standard Deviation 


Some of the deviations about the mean are positive, others are negative. 
The sum of deviations is zero. One method of dealing with the presence of 
the negative sign is to use absolute deviations, as in the calculation of the 
mean deviation. An alternate, and in general preferable, procedure is to 
square the deviations. One measure of variation which employs the squares 
of deviations from the mean is the standard deviation. To calculate the 


— 3 


Sec. 4.5] Measures of Variation, Skewness, and Kurlosis 55 


standard deviation we add together the squares of deviations about the 
mean, divide by N, and take the square root. Thus by definition the 
standard deviation is the square root of the mean of the squared deviations. 
To illustrate, the mean of the measurements 1, 4, 7, 10, and 13 is 7. The 
deviations from the mean are — б, —3, 0, +3, and +6. The squares of these 
deviations are 36, 9, 0, 9, and 36. The sum of squares is 90. We divide by 
N to obtain the average sum of squares. This is %° = 18. The standard 
deviation is the square root of this quantity and is 4/18 = 4.24. 
In algebraic language the standard deviation is given by the formula 


| Rx-2t 
s= e (4.2) 


1 _ Throughout this book the symbol s is used to refer to the sample standard 
- deviation. In the above formula X — X is a deviation from the mean and 
- N is the number of measurements. The above formula defines the standard 


deviation. It has no derivation. It is obtained through a process of 
plausible reasoning, but this process is not a derivation in the usual mathe- 
matical sense. 

The square of the standard deviation s* is called the variance. The 
variance is the mean of the squared deviations, or 


‚(Хх =X)? 
قي‎ сю ae (4.3) 


The standard deviation is the most important and most frequently used 
measure of variation. It has many uses. In any experiment the standard 
deviations should be subject to careful scrutiny because the important 
information in the data may frequently reside not in the differences between 
means, but in differences in variation. 

To meaningfully compare the standard deviations of two sets of measure- 
ments requires that the measurements be of the same kind. Thus it is 
meaningful to compare the standard deviations of IQ's for male and female 
students, or the standard deviations of error scores made by two groups of 
experimental animals in running a maze, or the standard deviations of weight 
or height for groups of school children under different diets. It is not 
meaningful to compare the standard deviation of IQ's for a group of children, 
measured in units on an IQ scale, with the standard deviation of heights 


_ measured in inches. The question, are children more variable in intelligence 


than they are in height, is not meaningful. 


4.5. Calculating the Standard Deviation from Ungrouped Data 


For purposes of calculation, particularly where a machine computer is 
available, it is convenient to write the variance and the standard deviation 


56 Slalislical Analysis in Psychology and Educalion (CHAP. 4 
in a different form. We may write the variance 


AKS £) _ 5(Х'+ $3—2xX) 
5 N RC N 

_ 2X? , NX? MX 

b ub AAT UU 


s? 


_ 2× :چ ر‎ 2 
= +8: -28 
= 2X? _ ўз 
=F ре 
The standard deviation is then 
= zx? - 2 
$= IW X. (4.4) 


Thus to calculate the standard deviation using this formula we sum the 
squares of the original observations, divide by N, subtract from this the 
square of the mean, and then take the square root. For example, the 
observations 1, 4, 7, 10, 13 have a mean of 7. The squares of these observa- 
tions are 1, 16, 49, 100, and 169. The sum of these squared observations is 
335. The variance is then 


and the standard deviation is V/18, or 4.24. 
A formula closely related to 4.4, which has certain computational advan- 
tages, is 


s= 1 VNEx = GX} (4.40) 


This formula requires one operation of division only. ' 

In computing a standard deviation on a calculating machine the measure- 
ments are entered on the machine and the sum and sum of squares of measure- 
ments obtained in a single operation. This yields the information required 
to calculate the standard deviation. In most cases it is advisable to repeat 
the operation as a check. 


4.6. Calculating the Standard Deviation from a Frequency 
Distribution 


The formula used in calculating the standard deviation s from data grouped 
in the form of a frequency distribution is 


rong Y as 


Src. 4.6] Measures of Varialion, Skewness, and Kurlosis 57 


where й = class interval 
f = frequencies 
x' — computation variable 
Application of this formula is illustrated with reference to Table 4.1. 


First, select an arbitrary origin near the middle of the distribution and 


TABLE 4.1 
CALCULATING THE STANDARD DEVIATION FOR FREQUENCY DISTRIBUTION OF 
Test Scores 


(1) (2) (3) (4) (5) 
Computation Frequency by| Frequency by square 
Class | Frequency 5 computation| of computation 
: variable y E 
interval / až variable variable 
fx frt. 
45-49 1 5 5 25 
40-44 2 4 8 32 
35-39 EI 3 9 27 
30-34 6 2 12 24 
25-29 8 1 8 8 
20-24 17 0 0 0 
15-19 26 -1 —26 26 
10-14 11 —2 -22 44 
5-9 2 —3 —6 18 
0-4 0 —4 0 0 
Total. z 76 5 = 12 204 
لے‎ eS ا ا‎ 


2a 7 il E zb s = 5/3893 — (712/16)? = 8.16 
write down the computation variable x'. Second, multiply the frequencies 
by the computation variable with due regard to sign to obtain the products 
ух! as shown in col. 4. Third, multiply the products fx’ of col. 4 by the 
computation variable x’ to obtain the values fx? in col. 5. Fourth, sum 
cols. 4 and 5 to obtain Zfz' and Sfx”, in this case — 12 and 204, respectively. 
fx’ and Xfx’? are the sum and sum of squares of deviations about the 
arbitrary origin in units of class interval. Fifth, substitute the values 
obtained in formula (4.5) above. This formula involves the conversion of a 
sum of squares of deviations about an arbitrary origin, in units of class 
interval, to a sum of squares of deviations about the actual mean in original 


units. Thus 
zs OEE | (—12 PSY 
s= hay ( N ) 76 76 " 


In summary, the steps are: 


- 


58 Slalislical Analysis in Psychology and Education [Сналр. 4 


1. Select an arbitrary origin and write down the computation variable. 

2. Multiply the frequencies by the computation variable to obtain the 
products fx’. 

3. Multiply these products by the computation variable to obtain the 
products fx". 

4. Sum fx’ and fx’? to obtain Sfx’ and Efx”. 

5. Apply formula for calculating s from grouped data. 

То check the result the calculation may be repeated using a different 
arbitrary origin. 


4.7. The Machine Calculation of the Standard Deviation from 
Grouped Data 


In calculating the standard deviation from grouped data using a calculating 
machine the following procedure has some advantages: 

1. Select the mid-point of the lowest class interval as the arbitrary origin. 

2. Write down the computation variable 0, 1, 2, 3, and so on, opposite 
the frequencies. 

3. Enter the values of the computation variable on the left of the keyboard, 
the square of these values on the right of the keyboard, and multiply by the 
frequencies, allowing the machine to accumulate the products. Thus the 
quantities Xfx’ and fx’? are obtained directly. 

4. Use these quantities in the usual way in the formula for computing the 
standard deviation from grouped data. 


4.8. Effects of Grouping 


In calculating the mean and standard deviation all observations in any 
class interval are assigned a value equal to the mid-point of the interval. 
With distributions that taper off to zero at the extremities, the point of 
concentration of the values within any interval is not the mid-point of the 
interval but is usually a point slightly nearer the mean. Thus the mean of 
the original observations within any class interval will tend to be a little bit 
closer to the mean of the distribution as a whole than the mid-point of the 
interval. 

In computing the mean from grouped data, grouping exerts no systematic 
effect because errors resulting from the assumption that the observations 
are concentrated at the mid-point of each interval tend to balance, the errors 
on one side of the mean being positive and those on the other negative. Thus 
a mean calculated from grouped data may be expected to differ very little 
from that calculated from ungrouped data. 

The standard deviation, however, involves the squaring of deviations 


Sec. 4.9] Measures of Variation, Skewness, and Kurlosis 59 


about the mean. In consequence, the errors of grouping on either side of 
the mean do not tend to cancel each other but add together. Thus a standard 
deviation calculated from grouped data will tend to be larger than a standard 
deviation calculated from the original ungrouped observations. A cor- 
rection, known as Sheppard's correction for grouping, may be applied to the 
standard deviation. The formula for this correction is as follows: 


Se= 4/2 — үз (4.6) 


where s, — corrected standard deviation 

s — uncorrected standard deviation 

h = class interval 

Where the class interval is small, the effects of grouping on the standard 

deviation are not great and the corrected value will differ only slightly from 
the uncorrected value. If the class interval is large, the effects of grouping 
may be substantial. Sheppard's correction is applicable only to continuous 
variables whose distributions are roughly normal in form. It is not appli- 
cable to rectangular, J-shaped, or U-shaped distributions. The correction 
should be used in all cases where an accurate estimate of the population 
standard deviation is required. It should not be used in the application of 
certain tests of significance, a point to be discussed in Chap. 10. 


4.9. The Effect on the Standard Deviation of Adding or 
Multiplying by a Constant 


If a constant is added to all the observations in a sample, the standard deviation 
remains unchanged. An examiner may conclude, for example, that an 
examination is too difficult. He may decide to add 10 marks to all the marks 
assigned. The standard deviation of the original marks will be the same as 
the standard deviation of marks with the 10 marks added. This result 
follows directly from the fact that if X is an observation, the corresponding 
observation with the constant c added is X + с. If X is the mean of the 
original observations, the mean with the constant added is X + с. Adevia- 
tion from the mean of the observations with the constant added is then 
(X + c) — (€ + c), which is readily observed to be equal to X — X. 
Since the deviations about the mean are unchanged by the addition of a 
constant, the standard deviation will remain unchanged. To illustrate, by 
adding a constant, say, 5, to the measurements 1, 4, 7, 10, and 13 we obtain 
6, 9, 12, 15, and 18. The mean of the original measurements is 7, and the 
mean of the measurements with the constant added is 7 + 5, or 12. The 
deviations from the mean are in both instances the same, namely, —6, 
—3,0, --3, and --6. The standard deviation in both instances is 4.24. 


60 Slalislical Analysis in Psychology and Educalion ` [Cuar. 4 


If all measurements in а sample are multiplied by a constant, the standard 
deviation is also multiplied by the absolute value of that constant. If the 
standard deviation of examination marks is 4 and all marks are multiplied 
by the constant 3, then the standard deviation of the resulting marks is 
3 X 4 = 12. To demonstrate this result we observe that if X is the mean of 
a sample of measurements, the mean of the measurements multiplied by с 
is cK. А deviation from the mean is then 


(cX — cX) = «(X — X) 
By squaring, summing over № observations, and dividing by N, we obtain 


(eX — cR)?  ex(x- Ey 
N "Per; 


= cs? (4.7) 


Thus if all measurements are multiplied by a constant c, the variance is 
multiplied by c? and the standard deviation by the absolute value of с. If 
c is a negative number, say, —3, s is multiplied by the absolute value 3. By 
way of illustration the measurements 1, 4, 7, 10, 13 have a mean of 7, a 
variance of 18, and a standard deviation of 4.24. If the measurements 
are multiplied by the constant 5, we obtain 5, 20, 35, 50, 65. The mean is 
now 5 X 7, or 35. The deviations from the mean are —30, —15, 0, +15, 
+30. Squaring these we obtain 900, 225, 0, 225, 900. The sum of squares 
is 2,250, the variance is 450, and the standard deviation is 21.21, whereas 
5 times the original standard deviation of 4.24 is 21.20. The slight dis- 
crepancy results from the rounding of decimals. 


4.10. Standard Deviation of the First N Integers 


We state without proof that the sum of squares of the first N integers is 


N(N + DON + 1) 
at DON 39 (4.8) 


and the standard deviation of the first V integers is 


N*—1 
$ = Iu (4.9) 


Consider the integers 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. Applying the above 
formulas, the sum of squares is 385 and the standard deviation is 2.87. 
These results may be readily checked by direct calculation. 

These formulas are of particular use in relation to problems involving ranks 


(Chap. 12). Where ranks are used the observations are represented by the 
first N integers. 


Sec. 4.12] Measures of Variation, Skewness, and Kurtosis 61 
4.11. The Standard Deviations of Combined Groups 


Circumstances arise where we know the means and standard deviations of 
two samples of measurements and may wish to determine the standard 
deviations of the two samples combined. We may, for example, have means 
and standard deviations of examination marks for two classes of university 
students and may wish to find the standard deviation of marks for all the 
individuals in the two classes. Let the number of cases, mean, and standard 
deviation for one group be эһ, X;, and s; and for the other group ms, Xs, 
and ss. Let X and s be the mean and standard deviation of the combined 
groups. Let m + ns = N, X;— X = d, and Xs — X = d, We state 
without proof that the standard deviation of the combined group is 


Gus Js (misi? + nes? + mdi? + nod?) (4.10) 


This formulation can be generalized from two groups to any number of 
groups, say, k. 

To illustrate, the seven measurements 1, 6, 8, 10, 13, 18, and 21 have a 
mean of 11 and a variance of 41.14; that is, m = 7, X, = 11, and s? = 41.14. 
The five observations 1, 4, 7, 10, 13 have a mean of 7 and a variance of 18; that 
is, ма = 5, Хә = 7, and s = 18.0. The mean of all 12 observations taken 
together, the combined group, is 9.33. The quantity d; = 11 — 9.33 = 1.67, 
and dz = 7 — 9.33 = —2.33. The variance of the combined groups is then 


s? = 117 X 41.14 + 5 X 18.0 + 7(1.67)? + 5(— 2.33)?] = 35.39 
The standard deviation s = %/35.39 = 5.95. This result may be checked 
by direct calculation. 


4.12. Standard Scores 


A deviation from the mean divided by the standard deviation is called 
a standard score and is represented by the symbol z. Thus 


"oues an 
$ 


(4.11) 


“ie 


= 


Deviation scores X — X, or x, have a mean of zero and a standard deviation 
s. The subtraction of the mean from all measurements in a sample does not 
Change the standard deviation. Standard scores have zero mean and unit 
standard deviation. As previously shown, if all measurements in a sample 
are multiplied by a constant, the standard deviation is also multiplied by that 
constant. ‘The deviations from the mean X — X have a standard deviation 


62 Stalistical Analysis in Psychology and Educalion [C#Har. 4 


s. If all deviations are divided by s, which amounts to multiplying by the 
constant 1/s, the standard deviation of the score thus obtained is s/s — 1. 

'To illustrate, the following observations have been expressed in raw-score, 
deviation-score, and standard-score form. 


Individual X т z 
A 3 -1 —1.21 
B 6 EE! —.69 
С 7 -3 —.52 
D 9 -1 -.17 
E 15 5 ‚87 
Е 20 10 1478 


Because standard scores have zero mean and unit standard deviation they 
are readily amenable to certain forms of algebraic manipulation. Many 
formulations can be derived more conveniently using standard scores than 
using raw or deviation scores. 

The use of standard scores means in effect that we are using the standard 
deviation as the unit of measurement. In the above example individual A 
is 1.21 standard deviations, or standard deviation units, below the mean, 
while individual F is 1.73 standard deviation units above the mean. 

Standard scores are frequently used to obtain comparability of observa- 
tions obtained by different procedures. Consider examinations in English 
and mathematics applied to the same group of individuals and assume the 
means and standard deviations to be as follows: 


x з 
БАЙА. жылчы 10a a 65 8 
Mathematics........... 52 12 


In effect, in relation to the performance of the individuals in the group, a 
score of 65 on the English examination is the equivalent of a score of 52 on 
the mathematics examination. Likewise, to illustrate, a score one standard 
deviation above the mean, that is, 65 + 8, or 73, on the English examination 
can be considered to be the equivalent of a score one standard deviation above 
the mean, that is, 52 4- 12, or 64, on the mathematics examination. If an 
individual makes a score of 57 on the English examination and a score of 
58 on the mathematics examination, we may compare his relative per- 
formance on the two subjects by comparing his standard scores. On English 
his standard score is (57 — 65)/8 = — 1.0, and on mathematics his standard 


Sec. 4.14] Measures of Variation, Skewness, and Kurlosis 63 


score is (58 — 52)/12 — .5. Thus on English his performance is one 
standard deviation unit below the average, while on mathematics his per- 
formance is .5 standard deviation unit above the average. Quite clearly, 
this individual did much more poorly in English than in mathematics, relative 
to the performance of the group of individuals taking the examinations, 
although this is not reflected in the original marks assigned. To attain 
rigorous comparability of scores, the distributions of scores on the two tests 
should be identical in shape. The meaning of this statement will become 
clear as we proceed. 


4.13. Advantages of the Standard Deviation as a Measure 
of Variation 


The standard deviation and variance have many advantages over other 
measures of variation. Many branches of statistical method involve their 
use. 'The sample standard deviation is a more stable or accurate estimate 
of the population parameter than other measures. It provides a more stable 
estimate of the standard deviation in the population than the sample mean 
deviation, for example, does of the mean deviation in the population. This 
is one of the reasons why it has come to be accepted as the basic measure of 
variation. The standard deviation is more amenable to mathematical 
manipulation than other measures. It enters into the formulas for and 
computation of many types of statistics. It is used extensively as a measure 
of error. In later discussion on sampling statistics (Chap. 9) the reader will 
observe that the standard error is in effect the standard deviation of errors 
made in estimating population parameters from sample values. These 
errors result from the operation of chance factors in random sampling. A 
full appreciation of the importance and meaning of the standard deviation in 
its many ramifications requires considerable familiarity with statistical ideas. 


4.14. Moments 


The mean and the standard deviation are members of a family of descrip- 
tive statistics known as moments. The first four moments about the arith- 


metic mean are as follows: 


-£ 
0-0 
bt 2 
ms = zx N Ху = 53 " 12 
(Хх — £) i 
ms = N = 


т. = ——; 


64 Slalislical Analysis in Psychology and Education — |Снлр.4 


In general, the rth moment about the mean is given by 


_ (x - y 


Mr N 


(4.13) 


The term moment originates in mechanics. Consider a lever supported 
by a fulcrum. Ifa force f; is applied to the lever at a distance x; from the 
origin, then fix; is called the moment of the force. Further, if a second force 
f» is applied at a distance x», the total moment is fixi + fas. If we square 
the distances x, we obtain the second moment; if we cube them we obtain the 
third moment;andsoon. When we come to consider frequency distributions 
the origin is the analogue of the fulcrum and the frequencies in the various 
class intervals are analogous to forces operating at various distances from 
the origin. Observe that the first moment about the mean is zero and the 
second moment is the variance. The third moment is used to obtain a 
measure of skewness, and the fourth moment a measure of kurtosis. 


4.15. Measures of Skewness and Kurtosis 


A commonly used measure of skewness may be obtained from the second 
and third moments and is defined as 


5 T 
TE (4.14) 


The rationale of this statistic resides in the observation that when a dis- 
tribution is symmetrical the sum of cubes of deviations above the mean will 
balance the sum of cubes of deviations below the mean. Thus when the 
distribution is symmetrical, m = 0 and gı = 0. If the distribution has a 
long tail to the right, the sum of cubes of deviations above the mean will be 
greater than the corresponding sum below the mean. Under these circum- 
stances, where the distribution is positively skewed, g; is positive. Con- 
versely, where the distribution is negatively skewed, g, is negative. The 
second moment is used in the denominator of the expression for g, in order to 
make the measure independent of the scale of measurement. 

The most acceptable measure of kurtosis is obtained from the fourth and 
second moments and is defined as 


ma 
Lg Mi (4.15) 
When gı is zero, the distribution is a particular type of symmetrical dis- 
tribution known as the normal distribution. When gs is less than zero, the 
distribution is flatter on top than the normal distribution. When gs is 
greater than zero, the distribution is more peaked than the normal distribu- 


Sec. 4.15] Measures of Variation, Skewness, and Kurlosis 65 


tion. The rationale underlying ge as a measure of kurtosis resides in the 
fact that if two frequency distributions have the same standard deviation 
and one is more peaked than the other, the more peaked must of necessity 
have thicker tails. In consequence, the sum of the deviations about the mean 
raised to the fourth power will be greater for the more peaked than for the 
less peaked distribution. 


кю 


w 


£n 


EXERCISES 


. For the measurements 2, 5, 9, 10, 15, 19, compute the range, the average deviation, 


and the standard deviation. 


. Why is the standard deviation preferred to the average deviation as a measure of 


variation? 


- Compute the sum of squares about the arithmetic mean, the variance, and the standard 


deviation for the following frequency distribution by selecting an arbitrary origin within 
the interval 20 to 24. 


Class interval T 


40-44 
35-39 ‘ 
30-34 
25-29 
20-24 
15-19 
10-14 
5-9 
0-4 


س о с бо > UN о‏ سم 


N = 30 


Repeat the calculation using an arbitrary origin within the interval 15 to 19, 


. How does the addition of a constant to all the observations in a sample and multiplica- 


tion by a constant affect the standard deviation? 


. Calculate the mean and standard deviation of the first 100 integers. 
. For the following data obtain the, variance and the standard deviation of the com- 


bined groups: 


@m=6 Хх=20 s*-550 (b)m=14 X,=50 з = 400 
m=4 X,=10 з=40 m=6 = 75 з = 250 


. Express the measurements 4, 5, 9, 25, 7 in standard-score form. 
. Where Ẹ = 50 and s = 10, express the scores 12, 86, 55, and 92 in standard-score form, 
N 


- Show that у si! = N. 


jel 


. Compute the second, third, and fourth moments about the mean for the observations 


4, 6, 10, 14, 16. Compute measures of skewness and kurtosis. 


CHAPTER 5 


PROBABILITY AND THE BINOMIAL DISTRIBUTION 


5.1. Introduction 


In experimental work a line of theoretical speculation may lead to the 
formulation of a particular hypothesis. An experiment is conducted, and 
data obtained. How are the data interpreted? Do the data support the 
acceptance or rejection of the hypothesis? What rules of evidence apply? 
Questions of this type involve considerations of probability. The answers 
are in probabilistic terms. The assertions of the investigator are not made 
with certainty but have associated with them some degree of doubt, however 
small. 

Consider a hypothetical illustration. Two methods for the treatment of a 
disease are under consideration. Two groups of 20 patients suffering from 
the disease are selected. Method А is applied to one group, method B to the 
other. Following a period of treatment, 16 patients in group А and 10 
patients in group B show marked improvement. How may this difference 
be evaluated? Мау it be argued from the data that treatment А is in general 
superior to treatment B? Неге the investigator proceeds by adopting a trial 
hypothesis that no difference exists between the two treatments, that one 
treatment is no better than the other. He then estimates the probability of 
obtaining by random sampling under this trial hypothesis a difference equal 
to or greater than the one observed. If this probability is small, say the 
chances are less than 5 in 100, he may consider this sufficient evidence for the 
rejection of the trial hypothesis and may be prepared to assert that one 
method of treatment is better than the other. If the probability is not 
small and the observed difference may be expected to occur quite frequently 
under the trial hypothesis, say the chances are 20 in 100, then the evidence 
does not warrant the conclusion that one treatment is better than the other. 

In general, the interpretation of the data of experiments is in probabilistic 
terms. The theory of probability is of the greatest importance in scientific 
work where questions about the correspondences between the deductive 
consequences of theory and observed data are raised. Probability theory 
had its origins in games of chance. It has become basic to the thinking of 
the scientist. 

66 


Sec. 5.2] Probability and the Binomial Distribution 67 
5.2. The Nature of Probability 


Diverse views of the nature of probability may be entertained. The topic 
is controversial. No inclusive summary of these different views will be 
attempted here. We shall discuss three approaches to probability: (1) the 
subjective, or personalistic, (2) the formal mathematical, and (3) the empiri- 
cal relative-frequency approach. ‘These different ways of regarding proba- 
bility are not incompatible. 

The term probability may be used subjectively to refer to an attitude of 
doubt with respect to some future event. For example, the assertions may 
be made that “It will probably rain tomorrow,” or “The probability is 
small that I shall live to be 90 years old,” or “There is a high probability 
that a particular horse will win the Kentucky Derby.” Frequently, numeri- 
cal terms are used in making assertions of this kind, such as, “The odds are 
even that it will rain tomorrow,” or “І estimate that the chances are about 
95 in 100 that I shall die before I am 90 years old,” or “The chances are three 
to one that a particular horse will win the Kentucky Derby." АП such 
assertions, whether numerical terms are used or not, refer to feelingsof degrees 
of doubt or confidence with regard to future outcomes. This subjective 
usage is sometimes spoken of as psychological, or personalistic, probability. 

A second usage defines the probability of an event as the ratio of the 
number of favorable cases to the total number of equally likely cases. This 
usage stems from a consideration of games of chance involving cards, dice, 
and coins. For example, on examining the structure of a die the assertion 
may be made that no basis exists for choosing one of the six alternatives in 
preference to another; consequently all six alternatives may be considered 
equally likely. The probability of throwing a particular result, say, a 3, ina 
single toss is then }, there being one favorable case among six equally likely 
alternatives. This approach to probability involves a concept of equally 
likely cases, which has a degree of intuitive plausibility in relation to cards, 
dice, and coins. Difficulties present themselves, however, when we attempt 
to apply this approach in situations where it is impossible to delineate cases 
which can be construed to be equally likely. These difficulties have led to 
the argument that equally likely means the same as equally probable; 
therefore the definition is circular because it defines probability in terms of 
itself. Arguments have been advanced to escape this circularity. These 
need not detain us. The difficulty, however, is readily resolved by observing 
that the concept of equally likely in this definition of probability is а formal 
postulate and is not empirical. In effect we say, “Let us postulate that 
certain events are equally likely, and given this postulate let us deduce certain 
consequences.” This means that a theory of probability employing this 
postulate is a formal mathematical model. It may or may not correspond to 
empirical events. It may be demonstrated, however, that this model does 


68 Statistical Analysis in Psychology and Education — |CnAv. 5 


approximate closely to certain empirical events and consequently is of value 
in dealing with practical problems. 

'The situation here is somewhat analogous to that in ordinary Euclidian 
geometry. Euclidian geometry is a formal system comprised of a set of 
axioms, or primitive postulates, and their deductive consequences, called 
theorems. The proofs of the theorems hold regardless of questions of 
correspondence with the empirical world. We know, however, on the basis 
of lengthy experience, that these theorems can be shown to correspond 
closely to the world around us. In consequence, Euclidian geometry provides 
a valuable model for dealing with problems in engineering, surveying, 
building construction, and many other fields. Both with Euclidian geometry 
and probability it is useful to draw a clear distinction between the formal 
mathematical system and the empirical events for which the formal system 
may serve as a model. 

A third approach to probability is through a consideration of relative 
frequencies. If a series of trials is made, say, V, and a given event occurs 
r times, then r/N is the relative frequency. This relative frequency may be 
considered an estimate of a value p. If a longer series of trials is made, the 
relative frequency will usually be closer to р. The difference between 
7/N and p may be made as small as we like by increasing the value of N. 
The probability р is defined as the limit approached by the relative frequency 
as the number of trials is increased. This approach to probability requires 
that a population of events be defined. Probability is the relative frequency 
in the population. It is a population parameter. The relative frequency in 
a sample of observations is an estimate of that parameter. To illustrate, 
considera coin. The population of events may be regarded as an indefinitely 
large number of tosses which theoretically could be made. The proportion 
of heads р in this population is the probability of a head. This is often 
assumed to bej. If the coin is tossed 100 times and a proportion .47 of heads 
is obtained, this may be taken as an estimate of р. 

The ways of regarding probability described here, the subjective or 
personalistic, the formal mathematical, and the empirical through the study 
of relative frequencies, are not incompatible, and indeed it may be argued 
that all three must of necessity coexist. While subjective, or personalistic, 
probability may be an interesting topic of psychological inquiry, in practica! 
statistical work use is made mainly of the formal mathematical and relative- 


frequency approaches, the latter being the operational complement of the 
former. 


5.3. The Addition and Multiplication of Probabilities 


In throwing a die six possible events may occur. If we are prepared to 
assume, as in the formal mathematical approach to probability, that these six 


Sec. 5.4] Probability and the Binomial Distribution 69 


events are equally likely, then the probability of obtaining a 1, 2, 3, 4, 5, or 6 
in a single throw is $, the ratio of the number of favorable cases to the number 
of equally likely cases. Consider now the probability of obtaining either a 1, 
2, or 3 ina single throw. Since there are now three favorable cases among six 
equally likely cases, this probability is readily observed to be 4 + 4+ 4 = 4. 
This is an application of the addition theorem of probability. This theorem 
states that the probability that any one of a number of mutually exclusive events 
will occur is the sum of the probabilities of the separate events. “Mutually 
exclusive" means that if one event occurs, the others cannot. To illustrate 
further, in tossing two coins four possible events may occur. Both coins may 
be heads, both may be tails, the first may be a head and the second a tail, or 
the first may be a tail and the second a head. These events exhaust the 
possible outcomes. They may be represented as HH, TT, HT, TH. Again, 
if we assume these four events to be equally likely, the probability of any one 
of the four events is 4. By the addition theorem the probability of either 
two heads or two tails, that is, HH or TT, is 1 + $ = i. 

In throwing two dice the number of possible outcomes is 36, and the proba- 
bility of any particular outcome, assuming these to be equally likely, is 
gz, which is the product of the two independent probabilities, or $ X ğ. 
This is an application of the multiplication theorem of probability. This 
theorem states that the probability of the joint occurrence of two or more 
mutually independent events is the product of their separate probabilities. By 
mutually independent is meant that the occurrence of one event does not 
affect the occurrence of the other events. To illustrate, the probability of 
obtaining four heads in four tosses of a coin is} X $ X 4X $ = уу. The 
probability of drawing the ace, king, and queen of spades in that order 
in drawing one card from each of three well-shuffled decks of 52 cards is 
dy X уу X gly = 1/140,608. The probability of drawing the ace, king, and 
queen of spades in that order, and without replacement, from a single deck 
of 52 cards is уу X gp X gg, ог 1/132,600. The probability that the first 
card is the ace of spades is yy. Having drawn one card, 51 cards remain, and 
the probability that the second card is the king of spades is yy. Similarly, 
the probability that the third card is the queen of spades is тр. The proba- 
bility of the combined event is the product of the separate probabilities. 


5.4. Permutations and Combinations 


A knowledge of permutations and combinations is useful in dealing with 
many problems involving probabilities. * 

Consider two objects labeled A and B. Two arrangements аге possible, 
AB and BA. With three objects labeled А, B, and C, six arrangements are 
possible. These are ABC, ACB, BAC, BCA, CAB, and CBA. These 
arrangements are called permutations. In general, if there are № dis- 


10 Stalistical Analysis in Psychology and Educalion [Cuar. 5 


tinguishable objects, the number of permutations of these objects taken № 
at a time is given by N!, ог N factorial, which is the product of all integers 
from N to 1, or 


N(N —1)(€—2)-**-3X2X1 


For N =3,N!=3X2X1=6. ForN=5,N!=5X4X3X2X1= 120. 
Consider the number of seating arrangements of eight guests in eight chairs 
ata dinner table. The first guest may sit in any one of eight chairs. When 
the first guest is seated, the second guest may sit in any one of the remaining 
seven chairs. Thus the number of possible arrangements for the first two 
guests is 8 X 7 = 56. When the first two guests are seated, the third guest 
may occupy any one of the remaining six chairs, and so on for the remaining 
guests. The number of possible seating arrangements for the eight guests is 
81, or 40,320, a number which explains the indecision of many hostesses. 

Instead of considering the number of ways of arranging N things N at a 
time, we may consider the number of ways of arranging N things r at a time, 
where r is less than N. Thus the possible arrangements of the objects А, 
B, and C taken two at a time are АВ, AC, BA, BC, CA, and СВ. Here we 
observe that there are three ways of selecting the first object and two ways of 
selecting the second. The number of arrangements is then 3 X 2 — 6. 
Similarly, on considering the number of arrangements of 10 objects taken 3 
at a time, we observe that there are 10 ways of selecting the first, 9 ways of 
selecting the second, 8 ways of selecting the third. The number of arrange- 
ments is then 10 X 9 X 8 = 720. In general, the number permutations of 
N things taken r at a time is 


N! 
(N — r)! 
The number of ways of arranging three different letters from the word 
“snark” is 51/(5 — 3)! = 60. 

Consider a situation where of № objects n, are indistinguishable one from 
another; that is, they are alike, » are indistinguishable, and so on. Let У 
be comprised of k sets such that ж + ns + +++ + n, = N. The number 
of permutations in this case is given by 


PN = N(N—1)+++(V—r+1)= (5.1) 


N! 


PN(ny, na, * °° n) = —— 
Sie jm) mius! + + + m! 


(5.2) 
To illustrate, consider nine objects, four red, three black, and two white. 
The number of arrangements is 91/4! 3! 2! = 1,260. In tossing five coins the 
number of arrangements of three heads and two tails is given by 51/3121 = 10. 
These 10 arrangements are 


HHHTT  HHTHT HTHHT  THHHT  HHTTH 
HTHTH THHTH HTTHH THTHH TTHHH 


Sec. 5.5] Probability and the Binomial Distribution 71 


The number of different ways of selecting objects from a set, ignoring the 
order in which they are arranged, is the number of combinations. Given 
the objects А, B, C, and D, the number of permutations of two from this set is 
4 X 3 = 12. The arrangements аге AB, BA, AC, CA, AD, DA, BC, CB, 
BD, DB, CD, and DC. Note that each arrangement occurs in two different 
orders. If we ignore the order in which each pair of objects is arranged, we 
have the number of combinations. In this example each pair occurs in two 
different orders. The number of combinations is then 4 X $=6. In 
general, the number of different combinations of N things taken r at a time is 


N! 


Nui Ci Баа 
GP IIT EE 


(5.3) 
The number of combinations of 10 things taken 3 at a time is 101/3! 7! = 120. 

The number of combinations of № things taken N at a time is clearly 1, 
because there is only one way of picking all N objects, if we ignore the order 
of their arrangement. 


5.5. The Binomial Distribution 


In tossing 10 coins what is the probability of obtaining 0, 1, 2, . . . , 10 
heads? We are required to determine the probability of obtaining 0 heads 
and 10 tails, 1 head and 9 tails, 2 heads and 8 tails, and so on. Let us 
designate the 10 coins by the letters А, B, C, D, E, F, С, Н, I, and J. Let 
us assume that all 10 coins are unbiased and that the probability of throwing 
a head or a tail on a single toss of any coin is 3. 

Let us attend first to the probability of throwing 0 heads and 10 tails in 
tossing all 10 coins. The probability that coin A is not a head is 3, that B is 
not a head is 4, that C is not a head is 4, and so on. Therefore, from the 
multiplication theorem of probability, the probability that all 10 coins are 
not heads, or that they are tails, is obtained by multiplying { ten times; that 
is, (2)!9, or 1/1,024. Thus in tossing 10 coins there is 1 chance in 1,024 of 
obtaining 0 heads or 10 tails. ( н 

Now consider the problem of obtaining one head and nine tails. The 
probability that coin A is a head is }. The probability that all the remaining 
nine coins are tails is (3). Therefore the probability that A is a head and all 
other nine coins are tails is (4)!°. It is readily observed, however, that one 
head can occur in 10 different ways. A may be a head and all other coins 
tails, B may be a head and all the others tails, and so on. Since one head can 
occur in 10 different ways, the probability of obtaining one head and nine 
tails is 10(3)1° = 10/1,024. Thus in tossing 10 coins there are 10 chances in 
1,024 of obtaining one head and nine tails. | i 

Determining the probability of obtaining two heads and eight tails may be 
similarly approached. The probability that coins A and B are heads is 


72 Statistical Analysis in Psychology and Education [Cuap. 5 


(3). The probability that all the remaining coins are tails is ($). The 
probability that A and B are heads and all the remaining coins are tails is 
(4). We readily observe, however, that two heads can occur in quite a 
number of different ways. This number is the number of combinations of 
ten things taken two at a time, C2!°, which is 10 X 9/2 = 45. There- 
fore the probability of obtaining two heads and eight tails is 45(3)!9, or 
45/1,024. Similarly, the probability of obtaining three heads and seven tails 
is C3!° (3)? = 120/1,024. Likewise, the probability of obtaining four heads 
and six tails is C,1? (3)!^ = 210/1,024; and so on. The probabilities of 
obtaining different numbers of heads in tossing 10 coins is then as follows: 


No. of heads Probability 
10 1/1,024 
9 10/1,024 
8 45/1,024 
Ф 120/1,024 
6 210/1,024 
5 252/1,024 
4 210/1,024 
3 120/1,024 
2 45/1,024 
1 10/1,024 
0 1/1,024 


The above probabilities may be generated by the binomial expansion. If 
Ф is the probability that an event will occur and q is the probability that it 


will not occur and д + р = 1, the binomial expansion may be written in the 
form 


(a+ DY = e тр + NOY D зы 


N(N — 1)(N — 
RE cee + р“ (5.4) 


The terms of the expansion for N = 2, N = 3, № = 4 are as follows: 


(q+ 2) = ф@ + 20р + р? 

(q+ p = q* + 34р + Зар + р 

@ + p)? = qt + 4q%p + 6р1 + 4gp? + р 
In considering problems involving coins, p = q = } and the required proba- 
bilities are generated by the expansion 


1,1)" _ (1w IW | N(N – 1) үгү” 
02) -O +) 00) 


NAN — 1)(N — 2) /1\¥ 
tU Xix3 (2) 5 


Sec. 5.5] Probability and the Binomial Distribution 73 


This is known as the symmetrical binomial. This expansion generates the 
probabilities of obtaining different numbers of heads in tossing N unbiased 
coins. Where N = 2, we have (4)? + 2(4)*+ (4), or 4, $, and 4, as the 
probabilities of obtaining zero, one, and two heads, respectively, in tossing 
two coins. Where N = 3 we have (3)! + 3(4)* + 3(4)° + (4), or 4, 2, %, 
and $, as the probabilities of obtaining zero, one, two, and three heads, 
respectively, in tossing three coins. In throwing an unbiased die the proba- 
bility that a 6 will occur is 4. The probability that a 6 will not occur is $. 
In throwing № dice the probabilities of different numbers of 6's is given by 
(§ + $)". If N = 3 we have (§)* + 3(0*Q) + 3(8)(4)? + (°, or HE, 
21%, s rte, as the probabilities of obtaining zero, one, two, and three 6’s, 
respectively. 
Any term in the binomial expansion may be written as 


Стр = or AUTE 56) 


where C,N is the number of combinations of № things taken r at a time. 
Thus the probability of obtaining three heads in ten tosses is 


10! INEAN 7-120 
31010 — 3142) \2) 1024 


The coefficients С,“ in any expansion are 


1,240) NW DN | ., 
ЖУ SO гуа Б) 


These coefficients may be rapidly obtained for different values of № from 
what is known as Pascal's triangle. The coefficients for different values of N 


TABLE 5.1 
PascAL's TRIANGLE 


= 


р aa a e 1 
3406145 120 45/6 9171 
15.7121.:355 3512L x 7; y 1 
1 8 28 56 70 56 28 8 1 
1 9 36 84 126 126 84 36 9 1 
1 10 45 120 210 252 210 120 45 10 1 


© с به من‎ с олњ о ыо о 


- 


are written in rows in the form of a triangle as shown in Table 5.1. The 
number in any row is the sum of the two numbers to the left and right on the 


14 Statistical Analysis in Psychology and Education — [CuA». 5 


row above. This device is very useful in generating expected frequencies 
and probabilities. For example, for N = 10, the entries in the triangle are 
the expected frequencies of heads, or tails, in tossing 10 coins 1,024 times. 
The required probabilities in this case are obtained by dividing the frequencies 
by 1,024. 


5.6. Properties of the Binomial 


For the symmetrical binomial, where р = q = 4, the mean, variance, 
skewness, and kurtosis are 


m. 
ET 
ui 
4 (5.7) 
n-0 
2 
82 UN 


In tossing five coins 32 times, the expected frequencies of zero, one, two, three, 
four, and five heads are 1, 5, 10, 10, 5, and 1. These frequencies are the 
coefficients of the expansion (4 + 4)° and are obtained from Table 5.1. The 
mean, variance, skewness, and kurtosis of the expected distribution of 
heads may be obtained by direct calculation. "These values may, how- 
ever, be obtained very readily by using the above formulas. The mean is 
u = №2 = { = 2.5. The variance of the distribution ise? = N/4 = $ = 1.25. 
Because the distribution is symmetrical, the measure of skewness g; is equal 
to 0. The measure of kurtosis g = —2/N = —$ = —.40. Note that as 
N increases in size, ga becomes smaller. 
In general, the mean, variance, skewness, and kurtosis of any binomial 
distribution are given by 
к= Np 
e? = Npq 
ini (be d 8 
In tossing an unbiased die, the probability р of throwing a 6 is } and the proba- 
bility q of not throwing а 6 is §. The expected frequency distribution of 6's 
in tossing 10 dice 1,024 times is given by the coefficients of the expansion 
(f + '*. The mean of this distribution is и = Np = 42 = 1.667. The 
variance is 0? = Мру = 10 X 1X § = 1.389. The skewness g, of the dis- 
tribution is .566, and the kurtosis gs is .12. Note that as N increases in size 
both gı and gs approach zero as a limit. 


Szc. 5.1] Probability and the Binomial Distribution 15 
5.7. A Hypothetical Experiment 


The binomial distribution is frequently used as a model in evaluating 
experimental results. Such uses of the binomial may be illustrated with 
reference to a hypothetical experiment. 

An individual asserts that he has certain psychic powers which enable him 
to predict the outcome of future events. An experiment is arranged involv- 
ing the tossing of a coin. The individual is required to predict the outcome 
in 10 tosses. If we operate on the working hypothesis that the individual 
possesses no powers of the type claimed, the probability of a correct prediction 
by chance alone іп a single toss of the coin is ў. From the binomial expansion 
(5 + 4)!° we can ascertain the probabilities of different numbers of correct 
predictions. Thus the probability of the individual successfully predicting 
the outcome in all 10 trials by chance alone is 1/1,024, or .00098. The 
probability of nine successful predictions and one failure is 10/1,024, or 
.00977. The probability of eight successful predictions and two failures is 
45/1,024, or .04395, and so оп. The probability of nine or more successful 
predictions is .00977 + .00098 = .01075, and the probability of eight or 
more successful predictions is .04395 + ‚00977 + .00098 = .05470. Now 
clearly, before undertaking the experiment, some agreement must be reached 
regarding the number of correct predictions we are prepared to accept as 
evidence for the rejection of the hypothesis that the individual possesses no 
powers of the type claimed. 

We may agree arbitrarily that if the results obtained in the experiment 
could have occurred by chance with a small probability only, say, equal to or 
less than .05, then these results would be accepted as at least not incompatible 
with the claims for psychic powers. We observe that the probability of eight 
or more correct predictions by chance alone is 05470. This is greater than 
the .05 probability we have agreed to accept; consequently eight correct 
predictions would in this case not be considered sufficient evidence. The 
only possibilities here which would prove acceptable within the criterion 
adopted are nine or ten correct predictions. б 

Тһе experiment is conducted; seven correct predictions and three failures 
are obtained. The probability of seven or more correct predictions occurring 
by chance alone in ten trials may be calculated from the binomial distribution 
and is 176/1,024, or .17189. Thus there are about 17 chances in 100 of 
obtaining a result by ordinary guessing equal to or better than the one 
observed. In consequence, the experimental results provide no acceptable 
basis for rejecting the working hypothesis that the individual possesses no 
powers of the type claimed. 

Let us suppose that the individual 
we reasonably argue from this resu 
fact possess psychic power? Quite clearly, 


had made 10 correct predictions. Could 
It that the individual in question did in 
such a result is not incompatible 


76 Statistical Analysis in Psychology and Education [Снар. 5 


with the assertion of psychic power and provides no basis for rejecting that 
assertion. We observe, however, that circumstances other than the posses- 
sion of psychic power may possibly have led to the result obtained; that is, 
alternative explanations of the results may be possible. 

In experimental situations of the type described we would ordinarily 
require more than 10 trials. Let us suppose that 1,000 trials had been made 
and 550 correct predictions obtained. The probabilities required to evaluate 
this result would then be generated by the expansion (4 -+ 4)!°°, Quite 
clearly, the calculation of the required probabilities directly from the binomial 
would involve almost prohibitive arithmetical labor. Fortunately, a very 
close approximation to the required probabilities can be readily obtained 
from the normal probability distribution, which we shall now consider. 


EXERCISES 


- In rolling a die, what is the probability of obtaining either a 5 or a 6? 

. In rolling two dice, what is the probability of obtaining either a 7 or an 11? 

. In dealing four cards without replacement from a well-shufiled deck, what is the prob- 

ability of obtaining four aces? 

4. On four consecutive rolls of a die a 6 is obtained. What is the probability of obtaining 
a 6 on the fifth roll? 

5. How would you proceed to estimate the probability that a sentence selected at random 
from this book contains more than 12 words? 

6. In seating eight people at a table with eight chairs, what is the number of possible seat- 
ing arrangements? 

7. In how many ways can two people seat themselves at a table with four chairs? 

8. In tossing three coins, what is the probability of obtaining two heads and one tail? 

9. In how many ways can a committee of three be chosen from a group of five men? 

0. Assume that intelligence and honesty are independent. If 10 per cent of a population 
are intelligent and 60 per cent are honest, what is the probability that an individual 
selected at random is both intelligent and dishonest? 

11. What is the expected distribution of heads in tossing six coins 64 times? 

12. What is the expected distribution of 6's in rolling six dice 64 times? 

13. What is the probability of obtaining either nine or more heads or three or less heads in 

tossing 12 coins? 


wne 


CHAPTER 6 


THE NORMAL CURVE 


6.1. Introduction 


The frequency distributions of many events in nature are found in practice 
to be approximated closely by a particular bell-shaped type of curve known as 
the normal curve. Errors of measurement and errors made in estimating 
population values from sample values are often assumed to be normally 
distributed. The frequency distributions of many physical, biological, and 
psychological measurements are observed to approximate the normal form. 
Because the frequency of occurrence of many events in nature can be shown 
empirically to conform fairly closely to the normal curve, this curve can be 
used as a model in dealing with problems involving these events. Before 
proceeding with a detailed discussion of the normal curve, let us consider 
briefly the nature of functions and frequency curves in general, 


6.2. Functions and Frequency Curves 


When two variables are so related that the values of one depend on the 
values of the other they are said to be functions of each other. A function is 
descriptive of change in one variable with change in another. The area of a 
circle is a function of the radius, and the volume of a cube is a function of the 
length of the edge. Consider the equation Y = bX + а. This is a linear 
function. It is the equation for a straight line; Y and X are variables; b and 
а are constants. If 6 and a are known, different values of X can be substi- 
tuted in the equation and the corresponding values of Y obtained. If the 
paired values of Y and X are plotted on graph paper, Y on the vertical and X 
on the horizontal axis, a straight line results. Y and X bear a functional 
relation to each other, and this relation is linear. Y is sometimes spoken of 
as the dependent and X the indeperdent variable. A functional relation 
may be written in the general form Y = /(Х). This simply states that У 
is some function of X. Here the nature of the function is not specified. 

Consider now the binomial (q + р)^. The terms in the binomial expan- 
sion are the expected relative frequencies or probabilities associated with 
particular events. Inspection of formula (5.4) indicates that any term in 


7 


18 Statistical Analysis in Psychology and Education [Снар. 6 


the binomial expansion is given by 
br = CNN (6.1) 


where p, is the probability of the rth event. This expression is a function. 
For fixed values of NV, p, and q, different values of r may be substituted on 
the right and the corresponding values of р, obtained. Here р, is the depend- 
ent and r the independent variable. The variable r is restricted to the 
N + 1 values 0, 1, 2, . . . , N; consequently p, is also restricted to a fixed 
number of possible values. The paired values of p, and r may be plotted on 
graph paper, р, on the vertical and r on the horizontal axis. The resulting 
graph is a visual description of the functional relation between the event r 
and its relative frequency or probability pr. 

In the binomial the variable r is discrete and not continuous. In tossing 
50 coins, for example, the number of heads or tails obtained is a discrete 
number. The value of р, changes from r to r + 1 by discrete steps. We 
observe, however, that as AV increases in size we obtain a larger and larger 
number of graduations of the distribution and by increasing the size of N we 
can make the graduations as fine as we like. By considering the situation 
where N becomes indefinitely large, that is, V approaches infinity, we arrive 
at the conception of a continuous frequency curve or function. This curve is 
the limiting form of the binomial. 


Fic. 6.1. Frequency curve showing area between X = a and X = b. 


Frequency curves are in certain instances conceptualized as extending 
along the X axis from minus infinity to plus infinity; that is, the curves taper 
off to zero at the two extremities. Although this is so, the area between the 
curve and the horizontal axis is always finite. For convenience this area is 
often taken as unity. 

On occasion it becomes necessary to find the proportion of the total area 
of the curve between ordinates erected at particular values of X, that is, 


Sec. 6.3] The Normal Curve 19 


between X = a and X = b as shown in Fig. 6.1. This proportion is the 
probability that a particular value of X drawn at random from the population 
which the curve describes falls between a and b. Because of this, frequency 
curves are often referred to as probability curves or probability distributions. 
Statisticians use a variety of theoretical frequency curves as models. The 
normal curve is one of the more important of these. 


6.3. The Normal Curve 


In tossing N coins the frequency distribution of heads or tails is approxi- 
mated more closely by the normal distribution as № increases in size. The 
normal curve is the limiting form of the symmetrical binomial. Theequation 
for the normal curve is 

М oce 


S е (6.2) 


where У = height of curve for particular values of X 

т = a constant = 3.1416 

е = base of Napierian logarithms = 2.7183 

N — number of cases, which means that the total area under the 

curve is V 
и and o = mean and standard deviation of the distribution, respectively 
We have used the notation p and е in this formula to represent the mean and 
standard deviation, instead of X and s, because the formula is a theoretical 
model. Presumably и and т may be regarded as population parameters. 
If N, и, and с are known, different values of X may be substituted in the 
equation and the corresponding values of Y obtained. If paired values of 
X and Y are plotted graphically, they will form a normal curve with mean и, 
standard deviation о, and area N. 
The normal curve is usually written in standard-score form. Standard 

scores have a mean of zero and a standard deviation of 1. Thus, = 0 and 
с = 1. The area under the curve is taken as unity; that is, N =1. With 


these substitutions we may write 


e (6.3) 


Үү = 


Here ғ is a standard score on X and is equal to (X — u)/o. The score z isa 
deviation in standard deviation units measured along the base line of the 
curve from a mean of zero, deviations to the right of the mean being positive 
and those to the left negative. The curve has unit area and unit standard 
deviation. By substituting different values of z in the above formula, 
different values of y may be calculated. When = 0, y = 1/4/22 = 3989. 
This follows from the fact that e° = 1. Any term raised to the zero power is 


80 Slalislical Analysis in Psychology and Education — [Cnr. 6 


equal to 1. Thus the height of the ordinate at the mean of the normal 
curve in standard-score form is given by the number .3989. For z = +1, 
y = .2420, and for z = +2, y = .0540. Similarly, the height of the curve 
may be calculated for any value of z. In practice the student is not required 
to substitute different values of z in the normal-curve formula and solve for 
y to obtain the height of the required ordinate. These values may be 
obtained from Table A of the Appendix. This table shows different values 
of y corresponding to different values of z. It also shows the area of the curve 
falling between the ordinates at the mean and different values of z. 


x/o orz 


Fic. 6.2. Normal curve showing height of the ordinate at different values of t/a, or z. 


The general shape of the normal curve can be observed by inspection of 
Fig. 6.2. The curve is symmetrical. It is asymptotic at the extremities; 
that is, it approaches but never reaches the horizontal axis. It can be said 
to extend from minus infinity to plus infinity. The area under the curve is 
finite. 


6.4. Areas under the Normal Curve 


For many purposes it is necessary to ascertain the proportion of the area 
under the normal curve between ordinates at different points on the base line. 
We may wish to know (1) the proportion of the area under the curve between 
an ordinate at the mean and an ordinate at any specified point either above 
or below the mean, (2) the proportion of the total area above or below an 
ordinate at any point on the base line, (3) the proportion of the area falling 
between ordinates at any two points on the base line. 

Table A of the Appendix shows the proportion of the area between the 
mean of the unit normal curve and ordinates extending from з = 0 to z = 3. 
Let us suppose that we wish to find the area under the curve between the 


Sec. 6.4] The Normal Curve 81 


ordinates at 2 = 0 and z = +1. We note from Table A that this area is 
.3413 of the total. Thus approximately 34 per cent of the total area falls 
between the mean and one standard deviation unit above the mean. The 
proportion of the area of the curve between 2 = O and з = 2is.4772. Thus 
about 47.7 per cent of the area of the curve falls between the mean and two 
standard deviation units above the mean. The proportion of the area 
between z = 0 and s = 3 is .49865, or a little less than 49.9 per cent. 

The proportion of the area falling between s = 0 and s = +1 is .3413. 
Since the curve is symmetrical the proportion of the area falling between 


xJa orz 
Fic. 6.3. Normal curve showing areas between ordinates at different values of x/ce, or z. 


= 0 and z = —1 is also .3413. The proportion of the area falling between 
the limits z = +1 is therefore .3413 + .3413 = .6826, or roughly 68 per cent. 
The proportion of the area falling between z = +2is 4772 + 4772 = .9544, 
or about 95 per cent, The proportion between z = +3 is 


49865 + .49865 = .99730 


or 99,73 per cent. The area outside these latter limits is very small and is 
only .27 per cent. For rough practical purposes the curve is sometimes 
taken as extending from 2 = +3. 

Consider the determination of the proportion of the total area above or 
below any point on the base line of the curve. For illustrative purposes let 
the point be з = 1. The proportion of the area between the mean and 
в = 1 is .3413. The proportion of the area below the mean is .5000. The 
proportion of the total area below 2 = 1 is therefore .5000 + 3413 = 8413. 
The proportion above this point is 1.0000 — .8413 = .1587. Similarly, the 
proportion of the area above or below any point on the base line can be 
readily ascertained. 


82 Statistical Analysis in Psychology and Education — [|Cuav. 6 


Consider the problem of finding the area between ordinates at any two 
points on the base line. Let us assume that we require the area between 
s = .5 and s= 1.5. From Table А of the Appendix we note that the 
proportion of the area between the mean and s = .50 13.1915. We note also 
that the area between the mean and z — 1.5 is .4332. The area between 
s = .50 and s = 1.5 is obtained by subtracting one area from the other and 
is 4332 — .1915 = .2417. The area for any other segment of the curve may 
be similarly obtained. 

On occasion we wish to find values of z which include some specified propor- 
tion of the total area. For example, the values of z above and below the 
mean which include a proportion .95 of the area may be required. We select 


-3 +2 -1 0 50 + 150 42 *3 
x/oorz 
Fic. 6.4. Normal curve showing area between ordinates at s = .50 and z = 1.50. 


a value of z above the mean which includes a proportion .475 of the total area 
and a value of z below the mean which also includes a proportion .475 of the 
total area. From Table A of the Appendix we observe that the proportion 
-475 of the area falls between z = 0 and z = 1.96. Since the curve is sym- 
metrical the proportion .475 of the area falls between z = 0 and z = — 1.96. 
"Thus a proportion .95, or 95 per cent, of the total area falls within the limits 
z= +1.96. Also a proportion .05, or 5 per cent, falls outside these limits. 
Similarly, it may be shown that 99 per cent of the area of the curve falls 
within, and 1 per cent outside, the limits z = +2.58. 


6.5. Areas under the Normal Curve— Illustrative Example 


The distribution of intelligence quotients obtained by the application of a 
particular test is approximately normal with a mean of 100 and a standard 
deviation of 15. We are required to estimate what per cent of individuals in 
the population have intelligence quotients of 120 and above. The intel- 


See. 6.6] The Normal Curve 83 


делсе quotient of 120 in standard-score form iss = (120 — 100)/15 1.33. 
Thus an intelligence quotient of 120 is 1.33 standard deviation units above 
the mean. Reference to a table of areas under the normal curve shows that 
the proportion of the area above a standard score of 1.33 із 002, Thus we 
estimate that on this particular test about 9.2 per cent of the population 
have intelligence quotients equal to or greater than 120. 

We are required to estimate for the same test the middle range of intelli- 
gence quotients which includes 50 per cent of the population. A table of 
areas under the normal curve shows that 25 per cent of the area under the 
curve falls between the mean and a standard score of —.675. Also 25 per 
cent of the area falls between the mean and a standard score of +.675. Thus 
50 per cent of the area falls between the limits of s = 3.675. The standard- 
score scale has a mean of zero and a standard deviation of unity. Here we 
must transform standard scores to the original scale of intelligence quo- 
tients with a mean of 100 and a standard deviation of 15. To trans- 
form standard scores to intelligence quotients we multiply the standard 
score by 15 and add 100. Thus the standard score —.675 is transformed to 
15 X (—.675) + 100 = 89.88 and +.675 to 15 X .675 + 100 = 110.12. 
Thus we estimate that about 50 per cent of the population have intelligence 
quotients within a range of roughly 90 and 110. 


6.6. The Normal Approximation to the Binomial 


The observation has been made that as N increases in size the symmetrical 
binomial is more closely approximated by the normal distribution. This 
means that the normal distribution may be used to estimate binomial 
probabilities. Consider a situation where ten coins are tossed a large number 
of times. What is the probability of obtaining either seven or more heads? 
Here the mean of the binomial is р = 10 X 4 = 5.0 and the standard 
deviation is c = үе = 1.58. Because the normal distribution is con- 
tinuous, and not discrete, we consider the value 7 as covering the exact 
limits 6.5 to 7.5. Thus we must ascertain the proportion of the area of the 
normal curve falling above an ordinate at 6.5, the mean of the curve being 
5.0 and the standard deviation 1.58. In standard-score form the value 6.5 is 
equivalent to z = (6.5 — 5.0)/1.58 = .949. The proportion of the area of 
the normal curve falling above an ordinate at з = .949 can be readily ascer- 
tained from Table A of the Appendix and is .171. Thus using the normal- 
curve approximation to the binomial we estimate the probability of obtaining 
seven or more heads in tossing ten coins as 471. We may compare this with 
the exact probabilities obtained directly from the binomial expansion 
shown in Table 6.1. This probability is 172. Here we note that the 
discrepancy between the estimate obtained from the normal curve and the 


exact binomial probability is trivial. 


84 Slalislical Analysis in Psychology and Education [Снар. 6 


Table 6.1 compares the binomial and normal probabilities for N = 10 
and р = 5. We note that in this instance the differences between the exact 
binomial probabilities and the corresponding normal approximations are 
small. 

TABLE 6.1 


COMPARISON OF BINOMIAL PROBABILITIES WITH CORRESPONDING NORMAL 
APPROXIMATIONS FOR N = 10 AND р = } 


Exact binomial Normal 
до heads. | probability approximation 

10 .001 .002 

9 | .010 ‚011 

8 ‚044 ‚044 

7 ‚117 114 

6 ‚205 ‚205 

5 ‚246 .248 

4 .205 .205 

3 117 ‚114 

2 ‚044 ‚044 

1 .010 .011 

0 .001 .002 
ЖОШ T ск? 1.000 1.000 


The accuracy of the approximation depends both оп N and p; as N 
increases in size the accuracy of the approximation is improved. For any 
N as р departs from } the approximation becomes less accurate. 


6.7, Summary of Properties of the Normal Curve 


The following is a summary of properties of the normal curve. 

1. The curve is symmetrical. The mean, median, and mode coincide. 

2. The maximum ordinate of the curve occurs at the mean, that is, where 
z = 0, and in the unit normal curve is equal to .3989. 

3. The curve is asymptotic. It approaches but does not meet the hori- 
zontal axis and extends from minus infinity to plus infinity. 

4. The points of inflection of the curve occur at points +1 standard 
deviation unit above and below the mean. Thus the curve changes from 
convex to concave in relation to the horizontal axis at these points. 

5. Roughly 68 per cent of the area of the curve falls within the limits +1 
standard deviation unit from the mean. 

6. In the unit normal curve the limits в = +1.96 include 95 per cent and 
the limits 2 = + 2.58 include 99 per cent of the total area of the curve, 5 per 
cent and 1 per cent of the area, respectively, falling beyond these limits. 


— c 


Снар. 6] The Normal Curve * "85 


pi 


10. 


п. 


EXERCISES 


. Find the height of the ordinate of the normal curve at the following z values: —2.15, 


— 1.53, +.07, +.99, --2.76. 

Consider a normally distributed variable with X = 50 and s = 10. For N = 200 find 
the height of the ordinates at the following values of X: 25, 35, 49, 57, and 63. 

Find the proportion of the area of the normal curve (a) between the mean and s = 1.49, 
(b) between the mean and z = 1.26, (c) to the right of s = .25, (d) to the right of s = 
1.50, (e) to the left of = — 1.26, (f) to the left of s = .95, (д) between s = +.50, (4) 
between z = —.75 and = = 1.50, (i) between s = 1.00 and s = 1.96, (j) between s = 
1.00 and s = 1.01. 

Find a value of z such that the proportion of the area (a) to the right of s is .25, (b) to 
the left of z is .90, (c) between the mean and z is .40, (d) between +s is .80. 

On the assumption that IQ's are normally distributed in the population with a mean of 
100 and a standard deviation of 15, find the proportion of people with IQ's (а) above 
135, (b) above 120, (c) below 90, (d) between 75 and 125. 

A teacher decides to fail 25 per cent of the class. Examination marks are roughly 
normally distributed, with a mean of 72 and a standard deviation of б. What mark 
must a student make to pass? 

In tossing 200 coins, estimate, using the normal approximation to the binomial, the 
probability of obtaining (а) more than 150 heads, (b) less than 75 heads, (c) between 
75 and 125 heads. 


. Error scores on a maze test for a particular strain of rats are known through prolonged 


experimentation to have an approximately normal distribution with a mean of 32 and 
a standard deviation of 8. In one experiment a control sample of six animals contains 
one animal with an error score of 66. What arguments may be advanced for discard- 
ing the results for this animal? 

Scores on a particular psychological test are normally distributed with a mean of 48 
and a standard deviation of 14. The decision is made to use a letter grade system А, 
B, C, D, and E, with the proportions .10, .20, .40, .20, and .10 in the five grades, respec- 
tively. Find the score intervals for the five letter grades. 

The following are data for test scores for two age groups: 


11- year | 14- year 
group group 


x 38 56 
5 8 12 
N 500 800 


p 


Assuming normality, estimate how many of the 11-year-olds do better than the aver- 
age 14-year-old and how many of the 14-year-olds do worse than the average 11-year- 
old. 

Recovery times for a particular disease under treatment A are known to be approxi- 
mately normal, with a mean of 33 days and a standard deviation of 4. One patient 
receives treatment B and recovers in 24 days. On the assumption that the distribu- 
tion of recovery times under treatment B is also normal with a standard deviation of 4, 
estimate from this single case the probability that the effects of the two treatments on 


recovery time do not differ. 


CHAPTER 7 


CORRELATION 


1.1. Introduction 


Hitherto we have considered the description of a single variable. We now 
approach the problem of describing the degree of simultaneous or con- 
comitant variation of two variables. The data under consideration, some- 
times called bivariate data, consist of pairs of measurements. The data, for 
example, may be measures both of height and weight for a group of school 
children, or measures both of intelligence and scholastic performance for a 
group of university students, or error scores for a group of experimental 
animals in running two different mazes. The essential feature of the data is 
that one observation can be paired with another observation for each member 
of the group. The study of this type of data has two closely related aspects, 
correlation and prediction. Correlation is concerned with describing the 
degree of relation between variables. Prediction is concerned with estimating 
one variable from a knowledge of another. We shall use for illustration the 
record of scores obtained on a psychological test administered to students 
entering university and examination marks obtained by these same students 
at the end of the first year of university work. The investigator may concern 
himself with obtaining a simple summary description of the degree of relation 
or correlation between test scores and examination marks. On the other 
hand, he may focus attention on the prediction of examination marks from а 
knowledge of psychological test scores, his purpose being to use psychological 
test scores to provide estimates, on university entrance, of subsequent 
scholastic performance. 

Historically, the study of the prediction of one variable from a knowledge 
of another preceded the development of measures of correlation. In the 
year 1885 Francis Galton published a paper called Regression towards 
Mediocrity in Hereditary Stature. Galton was interested in predicting the 
physical characteristics of offspring from a knowledge of the physical charac- 
teristics of their parents. He observed, for example, that the offspring of 
tall parents tended on the average to be shorter than their parents, whereas 
the offspring of short parents tended on the average to be taller than their 
parents. He used the word “regression” to refer to this effect. In modern 
statistics the term regression no longer has the biological implication assigned 

86 


Sec. 7.2] Correlalion А 87 


to it by Galton. In general, regression has to do with the prediction of one 
variable from a knowledge of another. Karl Pearson extended Galton's ideas 
of regression and developed the methods of correlation extensively used today. 

The most widely used measure of correlation is the Pearson product- 
moment correlation coefficient. This measure is used where the variables are 
quantitative, that is, of the interval or ratio type. Other varieties of cor- 
relation have been developed for use with nominal and ordinal variables. 
One measure commonly used to describe the relationship between two 
nominal variables is the contingency coefficient. Methods used with ordinal 
variables are called rank-order correlation methods. ‘These special types of 
correlation will be discussed in later chapters. 

In this chapter we shall present a discussion of correlation and proceed in 
Chap. 8 to a discussion of prediction and its relation to correlation, The 
reader will bear in mind that correlation and prediction are two closely 
related topics. Certain topics pertaining to the interpretation of the cor- 
relation coefficient and assumptions underlying its use can only be discussed 
following a consideration of prediction. 


7.2. Relations between Paired Observations 


Consider a group comprised of N members. Denote these by 4i, As, 
A;,..., Ay. Measurements are available on each member on two 
variables, X and Y. The data may be represented symbolically as follows: 


Measurement 


Members 


Let us assume that measurements have been arranged in order of magnitude 
on X extending from X;, the highest, to Ху, the lowest, measurement. 
Given this arrangement on X, we may consider the possible arrangements of 
Y with respect to X. Consider an arrangement where the values of Y are 
in order of magnitude extending from the highest to the lowest. Thus the 
member who is highest on X is also highest on Y, the member who is next 
highest on X is next highest on У, and so on, until the member who is lowest 
on X is also lowest on Y. This situation represents the maximum positive 


88 Statistical Analysis in Psychology and Educalion [Снар. 7 


relation between the two variables. Consider now an arrangement where the 
values of Y are reversed so that У, is the lowest and Vy is the highest. The 
member who is highest on X is lowest on У, the member who is next highest 
on X is next lowest on Y, and so on, until the member who is lowest on X is 
highest on Y. This situation represents the maximum negative relation 
between the variables. Consider a situation where the arrangement of Y is 
strictly random in relation to X. Values of Y may be inserted in a hat, 
shuffled, drawn at random, and paired with values of X. This is a situation 


Fic. 7.1a. High positive correlation. Fic, 7.15. Low positive correlation. 


of independence. The two sets of variate values bear a random relation to 
each other. Under this arrangement we may state that no relation exists 
between X and Y. Between the two extreme arrangements, representing 
the maximum positive and negative relation, we may consider arrangements 
which represent varying degrees of relation in either a positive or negative 
direction. To illustrate, let us assume that the values of X for the members 
Ay, As, Аз, A4, and As are the integers 5, 4, 3, 2, and 1. If the values of Y 
are the same integers and are also arranged in the order 5, 4, 3, 2, and 1, we 
have a maximum positive relation. If values of Y are arranged in the order 
4, 5, 3, 2, and 1, we have clearly a high positive relation, although not the 
highest possible. If values of Y are arranged in the order 1, 2, 3, 4, and 5, 
we have a maximum negative relation. Again an arrangement on У of the 
kind 1, 2, 4, 3, 5 would be high negative, although not the highest possible. 

Relations of the kind described above may be examined by plotting the 
paired measurements on graph paper, each pair of observations being 
represented by a point. Such a plotting of measurements is sometimes 
called a scatter diagram. Inspection of a scatter diagram yields an intuitive 
appreciation of the degree of relation between the two variables. Figure 
7.1 shows four such diagrams. 

Figure 7.1a is a graphical representation of a high positive relation. Note 
that the points fall very close to a straight line. If the points fall exactly on à 
straight line, a perfect positive relation exists between the variables. Figure 
7.16 shows a low positive relation. Figure 7.1c shows a relation which is 


Sec. 7.3] Correlation 89 


more or less random. No systematic tendency is observed for high values of 
X to be associated with high values of Y and low values of X to be associated 
with low values of Y, or vice versa. Figure 7.14 shows a fairly high negative 
relation. Again, if all the points fall exactly along a straight line, a perfect 
negative relation exists. It is obvious that between the two extremes of a 
perfect positive and a perfect negative relation an indefinitely large number 
of possible arrangements of points may occur representing an indefinitely 
large number of possible relations between the two variables. 


Fic. 7.1с. Zero correlation. Fic. 7.1d. Negative correlation. 


7.3. The Correlation Coefficient 


Measures of correlation are conventionally defined to take values ranging 
from —1 to +1. А value of — 1 describes a perfect negative relation. All 
points lie on a straight line, and X decreases as Y increases. A value of +1 
describes a perfect positive relation. All points lie on a straight line, and X 
increases as Y increases. A value of 0 describes the absence of a relation. 
The variable X is independent of Y or bears a random relation to У. Meas- 
ures of correlation take positive values where the relation is positive and 
negative values where the relation is negative. 

The most commonly used measure of correlation is the Pearson product- 
moment correlation coefficient. Many forms of correlation are particular 
cases of this coefficient. Let X and V be two sets of paired observations with 
standard deviations s, and зу. We may represent the paired observations in 
standard-score form by taking deviations from the mean and dividing by the 


standard deviation. Thus 


"The standard scores have a mean of zero and a standard deviation of unity. 
The product-moment correlation coefficient, denoted by the letter r, is the 


90 Statistical Analysis in Psychology and Education [СнаР. 7 | 


average product of the standard scores. The formula for r in standard-score 
form is 


c= D (7.1) 


Thus the correlation coefficient may be obtained by converting the two 
variables to standard-score form, summing their product, and dividing by N. 

A brief and rather incomplete digression on the rationale underlying the 
above coefficient is appropriate here. Consider a set of paired observations 
in standard-score form. The sum of products of standard scores Zz,z, is 
readily observed to be a measure of the degree of relationship between the 
two variables. Let us consider the maximum and minimum values of 
Ez,z, This sum of products is observed to take its maximum possible value 
when (1) the values of zz and z, are in the same order and (2) in addition 
every value of sz, is equal to the value of z, with which it is paired, the two 
sets of paired standard scores being identical. If the paired standard scores 
are plotted on graph paper, all points will fall exactly along a straight line 
with positive slope. Since all pairs of observations are such that zs = zy, 
we may write z,z, = 2.2 = 2,2 and Zz,z, = Dz, = £s, The variance of 
standard scores is equal to unity; that is, Yz.2/N = Xs/N = 1. Hence 
Ez, = Xs = N. Thus we observe that the maximum possible value of 
Ez,z,is equal to №. Similarly, Ez,z, will take its minimum possible value 
when (1) the values of z+ and 2, are in inverse order and (2) in addition every 
value of zz has the same absolute numerical value as the з, with which it is 
paired, but differs in sign. This minimum value of Xz;:, is readily shown to 
be equal to ~N. Graphically, all points will fall exactly along a straight 
line with negative slope. When =, and z, bear a random relation to each 
other, the expected value of Zz;2, will be zero. We may define a coefficient 
of correlation as the ratio of the observed value of Х2,2, to the maximum 
possible value of this quantity; that is, r is defined as Х2,2,/ У. Since 
Xz% has a range extending from N to — N, the coefficient r will extend 
from +1 to —1. We note that a term of the kind s,s, when viewed geo- 
metrically is an area, Xz,5, is a sum of areas, and Ez,5,/ N, or r, is an average 
area. The reader will note that for any particular set of paired standard 
scores the maximum and minimum values of 22,2, obtained by arranging 
the paired scores in direct and inverse order are not necessarily N and — У. 
A maximum value equal to N will occur only when the paired observations 
have the characteristic that every value of s+ is equal to the value of z, with 
which it is paired. A minimum value of — N will occur only when every 
value of z, is equal to s, in absolute value, but differs in sign. When the 
data do not have these characteristics, the limits of the range of r, for the 
particular set of paired observations under consideration, will be less than 
+1 and greater than — 1. 


Sec. 7.4] Correlation 91 


7.4. Calculation of the Correlation Coefficient from 
Ungrouped Data 


The formula for the correlation coefficient in standard-score form is 
r = Уз,2,/ М. The calculation of a correlation coefficient using this formula 
is somewhat laborious because it requires the conversion of all values to 
standard scores. Since zz = (X — X)/s, and zy = (X — Ё)/з„ by sub- 
stitution we may write the formula for the correlation coefficient in deviation- 
score form. Thus 


_5(Х—-Х)(У-Ё) _ zx 
7m Nss5, PR ER (7.2) 
where x and y are deviations from the means X and Ё, respectively. 

The above formula for the correlation coefficient may be used for computa- 
tional purposes. The calculation is illustrated in Table 7.1. The first two 


Taste 7.1 
CALCULATION OF THE CORRELATION COEFFICIENT FROM UNGROUPED Data UsiNG 
DEVIATION SCORES 


a) (2) (3) 4) (5) (6) (7) 
X Y x y Ld y xy 
5 1 -1 —3 1 9 +3 
10 6 +4 +2 16 + +8 
5 2 -1 -2 1 4 +2 
11 8 +5 +4 25 16 +20 
12 5 +6 +1 36 1 +6 
4 1 -2 -3 4 9 +6 
3 4 —3 0 9 0 0 
2 6 —4 +2 16 4 -8 
7 5 +1 +1 1 1 +1 
1 2 -5 -2 25 4 +10 
60 40 0 0 134 52 48 
X =60 F =40 =x? zy Уху 
zx! 134 
= Үү M ae 13.4 — 3.66 
ر‎ VEE e Rusa = 2.28 
I x = +.58 


r= ss 10X 3.06 X 2.28 


columns contain the paired observations on X and F. These columns are 
summed and divided by V to obtain the means X and F. Column 3 contains 


92 Statistical Analysis in Psychology and Education — [Cuav. 7 


the deviations from the mean of X, and col. 4 the deviation from the mean of 
Y. Columns 5and б contain the squares of these deviations. These columns 
are summed to obtain Za? and Ху? These values are used to calculate 
sz and sy. Column 7 contains the products of x and у, and this column is 
summed to obtain Xxy. The correlation coefficient in this example is +.58. 

Since s, = V/Za?/N and s, = VIyYN, we may obtain by substitution 
the formula 


(7.3) 


If the values of s+ and s, are not required, although they usually are, this 
formula simplifies the calculation. 


TaBLE 7.2 
CALCULATION OF THE CORRELATION COEFFICIENT FROM UNGROUPED DATA USING 
Raw Scores 


a) | @ | G | (5 (5) 
X rq Yt; |. Xy 


5 1 25 1 5 
10 6 100 36 60 
5 2 25 4 10 
1 8 121 64 88 
12 5 144 25 60 
4 1 16 1 4 
3 4 9 16 12 
2 6 4 36 12 
7 5 49 25 35 
1 2 1 4 2 


60 40 494 212 288 
ХХ | ХҮ | zx: ХҮ? | XXY 


NZXY — ZXXY 
rime 
VINEX? — (ZX)!| NZY? — GE] 
A 10 X 288 — 60 X 40 js 480 
A/(10 x 494 — 602)(10 X 212 — 40) 4//1,340 x 520 


T.58 


For certain purposes it is desirable to express the formula for the correlation 
in terms of the raw scores or the original observations. This formula is as 
follows: 

I NZXY — ZXZY 
VINEX? — (ZX)][NZY? — (ZY)] 


(7.4) 


This is one of the more convenient formulas to use where a calculating 


Sec. 7.5] Correlation 93 


machine is available. Some modern calculating machines are so designed 
that pairs of observations may be entered successively on the machine and 
the terms XXY, ХХ?, EY?, EX, and XY obtained in a single operation. 
Where a calculating machine is not available, this formula usually involves 
rather large and unwieldly numbers and the formula in deviation form may 
be preferred. 

The application of the formula for computing the correlation coefficient 
from raw scores is illustrated in Table 7.2. The first two columns contain 
the paired observations on X and Y. These columns are summed and 
divided by N to obtain X and Ӯ. Columns 3 and 4 contain the squares of the 
observations, and these are summed to obtain ХХ? and XY*. Column 5 
contains the product terms XY, and the sum of this column is ZXY. The 
correlation is +.58, which checks with the value obtained by the previous 
method using deviation scores. 


1.5. Bivariate Frequency Distributions 


In Chap. 2 we discussed the construction of frequency distributions for a 
single variable. A frequency distribution was defined as an arrangement of 
the data showing the frequency of occurrence of the observations within 
defined ranges of the values of the variable, the defined ranges being the class 
intervals. Where one variable only is involved, the distribution may be 
spoken of as univariate. The frequency-distribution idea may be readily 
extended to two variable situations. A frequency distribution involving 
two variables is known as a bivariate frequency distribution. 

A bivariate frequency distribution is a table comprised of a number of 
rows and columns. The columns correspond to class intervals of the Х 
variable and the rows to class intervals of the Y variable. Each pair of 
observations is entered as a tally in its appropriate cell. То illustrate, Table 
1.3 shows a bivariate frequency distribution for a set of paired observations, 
these being scores on two forms of a French reading test. In constructing 
such a distribution a person who makes a score of 27 on Form A and a score 
of 31 on Form B is entered as a tally in the cell that is common to the row 
corresponding to the class interval 25 to 29 on Form A and the column cor- 
responding to the class interval 30 to 34 оп Form B. Similarly, every pair of 
observations is entered as a tally in its appropriate cell. The tallies in each 
cell are then counted, and their number recorded. These numbers are the 
bivariate frequencies. Ву summing the bivariate frequencies in the rows we 
obtain, as shown in Table 7.3, the frequency distribution for the Y variable, 
and by summing the columns we obtain the frequency distribution for the 
X variable. The separate frequency distributions of X and Y are usually 
written at the bottom and to the right of the table. In the selection of class 
intervals for X and Y the usual conventions regarding class intervals apply. 


94 Statistical Analysis in Psychology and Education — [Cnar. 7 


TABLE 7.3* 


BIVARIATE FREQUENCY DISTRIBUTION FOR Two FORMS or FRENCH READING TEST 


X-Form B 


04 5-9 10-14 15-19 20-24 25-29 30-34 35-39 Jv 


35-39 


30-34 


25-29 


lil 
20-24 


3 


Y-Form A 


ШИЛ 
8 


l 


* Tables 7.3 and 7.4 are reproduced from R. W. B. Jackson and George A. Ferguson, 
Manual of educational statistics, University of Toronto, Department of Educational 


Research, Toronto, 1942. 


1.6. Calculating a Correlation Coefficient from a Bivariate 


Frequency Distribution 


The calculation of the correlation coefficient from data grouped in the form 
of a bivariate frequency distribution is illustrated in Table 7.4. The two 
variables are the scores obtained by 80 children on a group intelligence test 
and on an arithmetic achievement test. Both variables have been grouped 


with a class interval of 10 points of score. 


Each pair of scores has been 


entered in the appropriate cell in the table, and the cell frequencies obtained. 
The distribution of scores on the arithmetic test is given in the column headed 
fy to the right of the table. The distribution of scores on the intelligence 
test is given in the row f; at the bottom of the table. 

Computation variables x’ and y’ are now introduced. The use of such 
variables has been described previously in the calculation of the mean and 
standard deviation from grouped data. Ап arbitrary origin is selected. 


Sec. 7.6] Correlalion 95 


TaBLE 44 
CALCULATION ОР THE CORRELATION COEFFICIENT FROM GROUPED DATA 
X (intelligence test) 
0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 f, у fy fy? xv yrr 


50-59 


40-49 


30-39 


20-29 


Y (arithmetic test) 


Ps Nxzy — fz fv’ 3 80 X 137 — 23 X 27 * 
MANI — Qus UNEA = Gd A/(80 X 217 — 232)(80 X 145 — 27!) 


64 


The numbers +1, +2, +3 and —1, —2, —3, and so on, are written opposite 
the intervals above and below the arbitrary origin. The frequencies fy are 
multiplied by the computation variable у to obtain fy’ and multiplied again 
to obtain fy’. The columns are summed to obtain >/у' апа £Zfy". Simi- 
larly, Sfx’ and Zf^? are obtained. 

To calculate the correlation coefficient we require Dx’y’, that is, the sum of 
products of scores on the two variables expressed as deviations from an 
arbitrary origin in units of class interval. We proceed in the following 
manner. Consider the computation variable x’ at the bottom of Table 7.4. 
Multiply the frequencies in each column of the table by the value of the 
computation variable directly beneath it; that is, in the first column each cell 
frequency is multiplied by —3, in the second column by —2,andsoon. The 
products obtained by this procedure are recorded in the upper right-hand 
corner of each cell. For example, the number 2 in the bottom cell of the 
second column is multiplied by —2, and —4 obtained, which is written in the 
upper right corner of that cell. Similarly, in the cell above this, —8 is 
obtained by multiplying 4 by —2, and so on. Some of the products thus 
obtained are positive, and others are negative. These products are now 
summed along the rows with due regard to sign. The sums are shown in the 


96 Slalislical Analysis in Psychology and Education — |Снль.7 


Xx’ column of Table 7.4. Each value in the Xx’ column is the sum of scores 
on the intelligence test, expressed as deviations from an arbitrary origin in 
units of class interval, of all persons who made scores on the arithmetic test 
falling within a given class interval. 

Now sum the =x’ column. The sum in this column, 23 in the present 
example, is equal to the sum of the fx’ row, or Х/х at the bottom of the table. 
The correspondence of these values is a check on the accuracy of the calcula- 
tion thus far. 

The next step is to multiply the values in the Xx’ column by the computa- 
tion variable у’, obtaining thereby the y'Ex' column. The sum of y'Zs, 
which in the present example is 137, is the sum of products Xa" y'. 

We now have all the information necessary to calculate the correlation 
coefficient. A suitable formula for calculating the correlation coefficient 
from data grouped in the form of a bivariate frequency distribution is given 
below: 


eoo Аку = ZfgZfy | 
м/а? — fr Ау” — (ју) 


(7.5) 


where x’ and y’ are deviations from the arbitrary origin in units of class 
interval, and N is as usual the number of pairs of observations. The applica- 
tion of this formula is shown in Table 7.4. Тһе correlation coefficient is .764. 

The calculation of correlation coefficients by the above method is admit- 
tedly tedious without the assistance of a calculating machine. Great care 
must be exercised to ensure the accuracy of the calculation. 


7.7. The Variance of Sums and Differences 


Let X and Y be two sets of measurements for the same group of individuals. 
These, for example, may be marks on mathematics and history examinations 
for a group of university students. What is the variance of X + V? If 
mathematics and history marks are added together, what is the variance of 
the sums? 

The sum of X and V is X + Y. The mean of the sum of X and Y is 
X + P, or the sum of the two means. We may then write the variance of 
sums as follows: 


st, A(X + Y) — (GC Р) 
2+0 NU 
_ X(X — £) + (Р – Fr 
N 
"It, RP, 2a pron 


N N 
= s, + رو‎ + 2rs,s, (7.6) 


Sec. 7.7] Correlation 97 


The variance of the sum of X and Y is the sum of the two variances plus 
2rs,s,. If the correlation between the two variables is zero, then 275.5, = 0 
and the variance of sums is simply the sum of the two variances. Terms of 
the kind rs,s, are sometimes called covariance terms, or covariances. 

Similarly, the variance of the differences between X and Y, the variance of 
X — Y, is readily shown to be 


Ou = sa + s ¬ 8y (7.7) 


The variance of differences is the sum of the two variances minus the covari- 
ance term 2rs55,. 

Alternative formulas for the correlation coefficient may be obtained from 
the formulas for the variance of sums and differences by writing these explicit 
forr. From the variance of sums we obtain 


$1,,— Se — Sy? 
r= ty ZE met ш (7.8) 


From the variance of differences we obtain 


mos eer $2 
de ddp cH E (7.9) 


These formulas can be readily adapted for computational purposes. 


EXERCISES 


1. Would you expect the correlation between the following to be positive, negative, or 
about zero? (a) The intelligence of parents and their offspring, (b) scholastic success 
and annual income 10 years after graduation, (c) age and mental ability, (d) marks on 
examinations in physics and mathematics, (е) wages and the cost of living, (f) birth rate 
and the numerosity of storks, (g) scores on a dominance-submission test for husbands 
and their wives. 

2. The following are paired measurements: 


x SHUN 7 
Y ЗЕ. 8. cB 


an 
m 


Compute the correlation between X and Y. 
3. Show that 
у му_ 
№, улху? 


4. When N = 2, what are the possible values of the correlation coefficient? 
The correlation coefficient is not necessarily equal to 4-1 when the paired measurements 


are in exactly the same rank order. Discuss. 


un 


98 Statistical Analysis in Psychology and Education — [Снлр.7 


6. Prepare a bivariate frequency distribution for the following data and compute the corre- 
lation coefficient. 


x Y x Y x Y 
22 18 19 25 11 17 
15 16 7 36 5 6 

9 31 6 27 26 45 

7 8 46 45 19 30 

4 2 1 18 8 18 
45 36 27 18 1 3 
19 12 19 37 9 7 
26 16 36 42 18 28 
35 47 25 20 46 21 
49 22 10 12 9 25 


7. Show that 
Siy = 523 + s! — 2rsesy 


8. Write a formula for the variance of the sum of three variables. 


CHAPTER 8 


PREDICTION IN RELATION TO CORRELATION 


8.1. Introduction 


Psychologists and educationists are frequently concerned with problems of 
prediction. The educational psychologist is interested in predicting the 
scholastic performance of a child from a knowledge of intelligence test scores. 
The industrial psychologist in the selection of an individual for a particular 
type of employment makes a prediction about the subsequent job per- 
formance of that individual from information available at the time of selec- 
tion. The clinical psychologist may direct his attention to predicting the ` 
patient's receptivity to treatment from information obtained prior to 
treatment. In many areas of human endeavor predictions about the 
subsequent behavior of individuals are required. A somewhat elaborate 
statistical technology has evolved for dealing with the prediction problem. 
In this chapter we shall restrict attention to the simplest aspect of prediction, 
the prediction of one variable from a knowledge of another. 

Prediction and correlation are closely related topics, and an understanding 
of one requires an understanding of the other. The presence of a zero cor- 
relation between two variables X and Y may usually be interpreted to mean 
that they bear a random relation to each other. A knowledge of X tells us 
nothing about Y, and a knowledge of Y tells us nothing about X. Inpredict- 
ing X from Y or Y from X no prediction better than а random guess is possible. 
'The presence of a nonzero correlation between X and Y implies that if we 
know something about X we know something about Y, and vice versa. If 
knowing X implies some knowledge of Y, a prediction of Y from X is possible 
which is better than a random guess about Y made in the absence of a knowl- 
edge of X. The greater the absolute value of the correlation between X and 
Y, the more accurate the prediction of one variable from the other. If the 
correlation between X and Y is either —1 or +1, perfect prediction is possible. 


8.2. The Linear Regression of Y on X 


Any set of paired observations may be plotted on graph paper, each pair 
of observations being represented by a point. Consider the data shown in 
Table 8.1, cols. 2 and 3. These columns contain intelligence quotients and 

99 


100 Statistical Analysis in Psychology and Education [Cuar. 8 


reading-test scores for a group of 18 school children. These data are plotted 
in graphical form in Fig. 8.1. While the arrangement of points when plotted 
graphically shows considerable irregularity, we observe a tendency for 
reading-test scores to increase as intelligence quotients increase. 


TABLE 8.1 
CALCULATIONS FOR REGRESSION LINE oF Y ом X FOR UNGROUPED Data* 
(3) (6) 
; Reading Expected 
ze E Score x? XY reading 
og ig score Y^ 
1 118 66 13,924 7,788 68 
2 99 50 9,801 4,950 55 
3 68 
4 70 
5 71 
6 54 
7 77 
8 70 
9 61 
10 63 
11 68 
12 64 
13 65 
14 63 
15 60 
16 57 
17 65 
18 57 10,201 57 
Sum..... 1,155 228,978 = 


* Reproduced from R. W. В. Jackson and George A. Ferguson, Manual of educational 
stalistics, University of Toronto, Department of Educational Research, Toronto, 1942. 


Let us suppose that we are given a child's intelligence quotient only and 
are required to predict his reading-test scores. How shall we proceed ? 
Clearly, the data show considerable irregularity. An exact correspondence 
between the two sets of scores does not exist. In this situation we may 
proceed by fitting a straight line to the data. This straight line provides an 
average statement about the change in one variable with change in the 
other. It describes the trend in the data and is based on all the observations. 
If, then, we are given a child’s intelligence quotient and are required to 
predict his reading-test score, we use the properties of the line. The method 
used in fitting a line to a set of points in a situation of this kind is the method 


Sec. 8.2] Prediclion in Relation lo Correlation 101 


of least squares. If our interest resides in predicting V from X, the method of 
least squares locates the line in a position such that the sum of squares of 
distances from the points to the line taken parallel to the Y axis is a minimum. 
This line is known as the regression line of Y on X. 

The general equation of any straight line is given by 


Y=bX+a (8.1) 


The quantity a isa constant. It is the distance on the Y axis from the origin 
to the point where the line cuts the У axis. It is the value of Y correspond- 
ing to X = 0. If we substitute X = 0 in the equation for a straight line, we 


Y, reading score 


100 105 110 15 120 125 130 135 
X, IQ. 


Fic. 8.1. Scatter diagram for data of Table 8.1. 


observe that Y = a. The quantity Ё is the slope of the line. The slope of 
any line is simply the ratio of the distance in a vertical direction to the dis- 
tance in a horizontal direction, as illustrated in Fig. 8.2. The slope describes 
the rate of increase in У with increase in X. If a and b are known, the loca- 
tion of the line is uniquely fixed and for any given value of X we can compute 
a corresponding value of Y. 

Where the regression line of Y on X is fitted by the method of least squares, 
the slope of the line бу and the point where the line cuts the Y axis ay, may 


be calculated by the formula 


IXY — (ZXZY/N) 
bye = "gy = (G30/NT ER 
ے‎ 2۲ = byt X (8.3) 


dys N 


102 Statistical Analysis in Psychology and Education [Снар. 8 


The quantity ZX is the sum of X, ХУ is the sum of V, XXY is the sum of 
products of X and У, ХХ? is the sum of squares of X, and N is the number of 
cases. 


Fic. 8.2. The slope of a line. 


To illustrate, consider the data of Table 8.1. Columns 2 and 3 provide 
intelligence quotients and reading scores for the 18 school children. Column 
4 provides the values X*, and col. 5 the products XY. Summing the columns, 
we obtain 


XXY = 130,806 
XX = 2,024 
ХҮ = 1,155 
DX? = 228,078 

М = 18 


Applying formulas (8.2) and (8.3), we have 
ы 130,806 — 2,024 X 1,155/18 


ba = 778978 — 201)/18  " 6708 
ay, = 1155 — .6708 X 20M وو _ ے‎ 


18 
The regression line of Y on X is then described by the equation 
Y' = .6708X — 11.25 


The symbol Y" has been introduced to refer to the estimated value of У, that 
is, the value of Y estimated from a knowledge of X. Y’ is a distance from 
the X axis to the line corresponding to any value of X. By substituting any 
value of X in the formula we obtain Y", the estimated value of У. Column 
6 of Table 8.1 shows the estimated reading-test scores obtained by applying 
this regression equation. 


Sec. 8.4] Prediction in Relation lo Correlation 103 
8.3. The Linear Regression of X on Y j 


Above we have considered the regression of Y on X. The regression line 
has been located in order to minimize the sum of squares of the distances 
from the points to the line taken parallel to the Y axis. Given reading-test 
scores and intelligence quotients, we concerned ourselves with predicting 
reading-test scores from intelligence quotients. If, however, we wish to 
predict intelligence quotients from reading-test scores, a different regression 
line is used. This is the regression line of X on Y. This line is located in a 
position such as to minimize the sum of squares of the distances from the 
points to the line parallel to the X axis. We see, therefore, that two regression 
lines may be fitted to any set of paired observations, the regression line of 
Y on X and the regression line of X on Y. The regression of Y on X is used 
in predicting Y from X. The regression of X on Y is used in predicting 
X from Y. These two lines will differ except in the particular case where all 
the points fall exactly on a straight line. Under this circumstance the two 
regression lines coincide. 

The formula for the regression line of X on Y is given by 


X' = by + ав (8.4) 


The symbol X" is used to refer to the predicted value of X, the value estimated 
from a knowledge of У; bzy is the slope of the regression line; and a4, is the 
point where the line intercepts the X axis. The values of bzy and az, may 
be calculated from the formulas 


IXY — (ZXZY/N) 


bev = "ay: — {EYN e» 
and ағу = Exe (8.6) 


8.4. Regression Lines for a Bivariate Frequency Distribution 


Where data are grouped in the form of a bivariate frequency table the 
frequencies in each row or each column of the table constitute a frequency 
distribution. Table 8.2 is a bivariate frequency distribution for scores 
on a verbal intelligence test and the Binet intelligence test. We note, for 
example, that 104 individuals make scores between 100 and 109 on the Binet. 
The frequencies in the 100-109 column comprise the frequency distribution of 
scores on the verbal test for all individuals with 1Q’s between 100 and 109 on 
the Binet. The mean score on the verbal test for these 104 individuals can 
be readily calculated from the distribution in the 100-109 column. If we 
know only that an individual's IQ falls between 100 and 109, the best estimate 
we can make of his verbal-test score is that he is at the mean of those indi- 


"puepoos "u3inquipq 'uonvonp ш YALA 10J [PUNO usmioos эЧ, Jo uorssmurzad Чим poonpoadayw 
10761 "uopuoT 'ssoiq uopuoT jo Áirsaatuy) ‘ussppiy 35171025 fo dmoa3 ootpyussaidoa D fo soussiyequs 241 *usyoounv "Ww сү шол р, 
STOL + ASS’ = ,X ‘A uo x jo uorsso13oy :g our] 
bree — XOL = „Д ‘X uo Д 30 uorssoiay : y әш 


g 
[m 
2 
H 
а 
й 


691-097 6SI-OST 6FI-OFT бЕТ-0Єр 601-021 611-011 601-007 66-06 68-08 62-01 69-09 65-05 


(Отзәшя) x 
«Л NO X аху X ко д зо SANIT NOISSSHO3) ONIMOHS ISAJ, SONSOITISIN[ I3N!([ 3HI 
аму ISAJ JONASITIAIN] TVHN3A V NO SAOG 1OOHOS HSILLOOS QOS ля G3NIVIS() 539025 20 кошпааяне AONA3203 1 aivnmvAg 
T8 алчуу, 


Sec. 8.5] Prediclion in Relation lo Correlation 105 


viduals with IQ's between 100 and 109. The means for all column arrays 
may be calculated. These are the mean verbal-test scores of the individuals 
falling within particular class intervals on the Binet scale. А straight line 
may be fitted to this set of means by the method of least squares. This line 
is the regression of Y on X, the regression line used in predicting verbal-test 
scores from Binet IQ's. Similarly, the means for the row arrays may be 
calculated and a line fitted to these means by the method of least squares. 
This line is the regression of X on Y, the regression line used in predicting 
Binet IQ's from verbal-test scores. The two regression lines are shown in 
'Table 8.2. 


8.5. Relation of Regression to Correlation 


Tf all points in a scatter diagram fall exactly along a straight line, the two 
rcgression lines coincide. Perfect prediction is possible. The correlation 
coefficient in this case is either —1 or +1. Where the correlation departs 
from either —1 or +1 the two regression lines have an angular separation. 
In general, as the degree of relationship between two variables decreases, 
the angular separation between the two regression lines increases. Where 
no relationship exists at all, the two variables being independent, the two 
regression lines are at right angles to each other. 

A simple relationship exists between the correlation coefficient and the 
slopes of the two regression lines. The slopes of the regression lines when 
expressed in deviation-score form are given by 


x(x — RY =P) 


yz Ns 
(х= (У Р) (8.7) 
uw Ns,? 


Since r = X(X — X)(Y — Y)/Nsssy, 


b, = ey 

= (8.8) 
PR 
ту 89 


Multiplying these two expressions, we obtain 

by. = 7? (8.9) 
Thus the product of the slopes of the two regression lines is the square of the 
correlation coefficient. The geometric mean of the two slopes is the cor- 


relation coefficient. 
Because of the above relation between correlation and regression we may 


write equations for the two regression lines using the correlation coefficient. 


106 Statistical Analysis in Psychology and Education [Снар. 8 


The two equations are as follows: 


y -r*(x €) - Y 
(8.10) 
X=r=(Y-YV)+X 
Sy 
These are the commonly used equations for predicting a raw score on one 
variable from a knowledge of a raw score on another. 

If measurements are represented in standard-score form, the correlation 
coefficient may be written as r = Zz,z,/N. If the pairs of standard scores 
are plotted graphically and two regression lines fitted to the data, the equa- 
tion of these lines may readily be shown to be 


ГА 
Zy = Tz 


SM yi (8.11) 


where 2, and z} are the predicted or estimated standard scores. Both regres- 
sion lines have the same slope, which is equal to the correlation coefficient. 
In this case the slope of the regression of Y on X relative to the X axis is the 
same as the slope of the regression of X on Y relative to the F axis. If the 
data are expressed in standard-score form, the correlation coefficient is the 
cosine of the angle between the two regression lines. 


8.6. Errors of Estimate 


In predicting one variable from a knowledge of another, distances from 
either the X or Y axis to the regression line are used as the predicted values. 
A difference between an observed value and a predicted value is an error of 
estimate. Thus in predicting Y from X, the predicted value Y" is a distance 
from the X axis to the regression line, and the difference between the observed 
value of Y and the predicted value, or Y — Y", is an error of estimate. If 
the pairs of observations, when plotted graphically as points, all fall exactly 
along a straight line, all values of Y — Y" = 0 and perfect prediction is 
possible. If the points appear to be arranged at random when plotted 
graphically, many values of Y — Y’ will be large. The more accurate the 
predictions possible, the smaller the values of Y — Y" will tend to be. The 
standard deviation of the errors of estimate, that is, of Y — Y", is taken asa 
measure of the accuracy of estimate and is given by 


Sys = Р" (8.12) 


This standard deviation s,., is known as the standard error of estimate. It is 
a measure of the accuracy in predicting Y from a knowledge of X. Similarly, 


Sec. 8.7] Prediction in Relation lo Correlation 107 


in predicting X from a knowledge of Y, the standard error of estimate is 


ТАЕ foe (8.13) 


where X = an observed value 
X' = value of X estimated from a knowledge of У 
The standard error of estimate is related to the correlation coefficient. It 
can be shown that 
Space М —? (8.14) 
and similarly 
Sey = 5 1 = r (8.15) 


By transposing these formulas we find relations as follows: 


2 2 
"es һ = T =. И x = (8.16) 


The above constitutes, in effect, an alternate definition of the correlation 
coefficient. If all pairs of points when plotted graphically fall exactly on a 
straight line, both sy. = 0 and Sz? = 0. In consequence, r will be either 
+1 or — 1, depending on whether we take the positive or the negative square 
root. If the points are arranged at random when plotted graphically, X and 
Y being independent of each other, з, = Sa’, 5122 = sy, andr = 0. The 
value of the correlation is seen, therefore, to depend on the ratio of two 
variances, 512/72 or 5.,2/54%. These two ratios are equal. 


8.7. The Variance Interpretation of the Correlation Coefficient - 


A correlation coefficient is not a proportion. A coefficient of .60 does not 
represent a degree of relationship twice as great as a coefficient of .30. The 
difference between coefficients of .40 and .50 is not equal to the difference 
between coefficients of .50 and .60. The question arises as to how correlation 
coefficients of different sizes may be interpreted. One of the more informa- 
tive ways of interpreting the correlation coefficient is in terms of variance. 

A score on Y may be viewed as comprised of two parts, an estimated value 
Y' and an error of estimation (Y — У): Hence Y = Y' + (Y — E. 
"These two parts are independent of each other; that is, they are uncorrelated. 
The variances of the two parts are directly additive, and we may write 


$y? = Sy? + з, (8.17) 
where s,* = variance of У 
— variance of values of Y predicted from X, that is, values on 
regression line 
variance of errors of estimation 


„ 
I 


Sys 


108 Statistical Analysis in Psychology and Education [Снлр. 8 
The variance sy. = s,*(1 — ғ). By substitution we obtain 
$y? = Sy? s — rt?) 


Dividing this equation by s,? and writing it explicit for r, we have 


2 Se" 
peg? (8.18) 
y 
Similarly, it may be shown that 
8 
fs = (8.19) 


These expressions state that 7° is the ratio of two variances, the variance of 
the predicted values of Y or X divided by the variance of the observed 
values of Y or X. 

The variance sy? is that part of the variance of Y which can be predicted 
from, explained by, or attributed to the variance of X. It is a measure of the 
amount of information we have about Y from our information about X. 
If r = .80, 7? = .64, and we can state that 64 per cent of the variance of the 
one variable is predictable from the variance of the other variable. We know 
64 per cent of what we would have to know to make a perfect prediction of 
the one variable from the other. Thus 7? can quite meaningfully be inter- 
preted as a proportion and 7? X 100 asa percent. In general, in attempting 
to conceptualize the degree of relationship represented by a correlation 
coefficient it is more meaningful to think in terms of the square of the cor- 
relation coefficient instead of the correlation coefficient itself. The values 
of r? X 100 for values of r from .10 to 1.00 are as follows: 


r r? X 100 
10 1 
.20 4 
.30 9 
.40 16 
.50 25 
60 36 
‚70 49 
.80 64 
.90 81 
1.00 100 


Thus a correlation of .10 represents a 1 per cent association, a correlation of 
.50 represents a 25 per cent association, and the like. A correlation of 
-7071 is required before we can state that 50 per cent of the variance of the 
one variable is predictable from the variance of the other. With a correla- 
tion as high as .90 the unexplained variance is 19 per cent. 

The existence of a correlation between two variables is indicative of a 
functional relationship, but does not necessarily imply a causal relationship. 


Sec. 8.8] Prediction in Relation to Correlation 109 


Whether a functional relationship can be regarded as a causal relationship is a 
matter of interpretation. The correlation between the intelligence of 
parents and their offspring has been frequently reported to be of the order of 
.50. This may be interpreted as indicative of a causal relationship. Fre- 
quently two variables may correlate because both are correlated with some 
other variable or variables. For example, given a group of children with a 
substantial range of ages, a correlation may be found between a measure of 
intelligence and a measure of motor ability. Such a correlation may come 
about because the measures of intelligence and motor ability are both cor- 
related withage. Ifthe effects of ageare removed, the correlation may vanish. 


8.8. Assumptions Underlying the Correlation Coefficient 


In interpreting the correlation coefficient it is assumed that the fitting of 
two straight regression lines to the data does not distort or conceal the 


X 
Fic, 8.3. Scatter diagram showing curvilinear relation. 


functional relation between the two variables. If the relation is curvilinear, 
a coefficient of zero may be obtained and yet a close relation may exist 
between the two variables. Figure 8.3 shows a curvilinear relation between 
Xand Y. If X is known, a fairly accurate prediction can be made of V. If, 
however, two straight regression lines are fitted to the data, these lines will 
be about at right angles to each other and r will be about zero. If a strictly 
random relation exists between X and Y, the correlation will be zero. The 
above example demonstrates that the converse does not hold. If the correla- 
tion is zero, it does not necessarily follow that X and Y bear a random relation 
to each other. This may mean that the linear-regression model is a poor fit 


110 Statistical Analysis in Psychology and Education [Снар. 8 


to the data. In interpreting the correlation coefficient it is ordinarily 
assumed that the linear-regression model is a good fit to the data and that a 
correlation of zero means a random relation. Consider a situation where 
r = .80. This means that 64 per cent of the variance of the one variable is 
predictable from the other and the residual 36 per cent is due to other factors. 
The assumption is that these factors do not include, at least to any appreciable 
extent, a lack of goodness of fit of the linear-regression lines to the data. Ifa 
large proportion of the residual 36 per cent did result because of nonlinearity, 
this would affect the interpretation of the data. In interpreting a correlation 
coefficient the investigator should satisfy himself that the linear-regression 
lines are a good fit to the data. Any gross departure from linearity can 
readily be detected by inspection of the bivariate frequency table. For small 
values of N, curvilinear relations may be difficult to detect. In practice, for 
many of the variables used in psychology and education the assumption of 
linearity of regression is in most instances reasonably well satisfied. 

In calculating a correlation coefficient it need not be assumed that the 
distributions of the two variables are normal. Correlations can be computed 
for rectangular and other types of distributions. If the two variables have 
different shapes, however, this circumstance will impose constraints upon the 
correlation coefficient. If a positively skewed distribution is correlated with 
a negatively skewed distribution, the differences in the shapes of the dis- 
tributions will influence the correlation coefficient. Some part of the depar- 
ture of the correlation coefficient from unity will result because of the different 
shapes of the two distributions. In such a situation as this the differences in 
shapes of the distributions will in effect ensure that one or the other or both 
regression lines are nonlinear. In psychological research substantial dif- 
ferences in the shapes of the distributions under study occasionally are found. 
Under these circumstances it is common practice to transform the variables 
to a binomial or to an approximately normal form. Such transformations 
will frequently tend to eliminate curvilinearity of regression. 

Many other circumstances affect the correlation coefficient. Among these 
may be mentioned sampling error and errors of measurement. The effects of 
these on the correlation coefficient are discussed in later chapters. 


EXERCISES 


1. The following are marks on a college entrance examination (X) and first year averages 
(Ү) for a sample of 20 students. 


X Y X Y x ү x ү 
SS 61 70 75 63 85 77 84 
79 72 80 ól 64 87 62 72 
» ө 8$ 79 69 70 85 70 
81 89 92 90 75 90 55 60 
62 52 60 55 84 67 66 67 


CHAP. 8] Prediclion in Relation to Correlation 111 


Compute (а) the correlation between entrance-examination marks and first-year aver- 
ages, (b) the regression equation for predicting first-year averages from examination 
marks, (c) the predicted first-year averages for the 20 students, (d) the variance of the 
errors of estimation, E 

2. Standard scores on variable X for four individuals are —2.0, —1.68, .18, 1.16. The 
correlation between X and Y is .50. What are the estimated standard scores on Y? 
What is the standard error in estimating standard scores on Y from standard scores 
on X? 

3. From the data X = 40.3, Y = 12,5, s, = 12.6, s, = 3.6, and rzy = .60, write the regres- 
sion equations for predicting Y from X and X from Y. 

4. Show that rey = V byzbzy. 

5. A correlation of .7071 may be interpreted to mean that 50 per cent of the variance of 
one variable is predictable from the other variable, Is this statement correct if the 
regression lines are not linear? 


СНАРТЕЕ 9 


ESSENTIAL IDEAS OF SAMPLING 


9.1. Introduction 


In Chap. 1 the concepts of sample and population were discussed. A 
population is any defined aggregate of objects, persons, or events, the variable 
used as the basis of classification being specified. A sample is any sub- 
aggregate drawn from the population. Any statistic calculated on a sample 
of observations is an estimate of a corresponding population value, or parame- 
ter. Thesymbol X is used to refer to the mean of X calculated on a sample of 
size №. The symbol д is used to refer to the mean of the population. Simi- 
larly, s is used to refer to the standard deviation in the sample and c is the 
corresponding population parameter. X is an estimate of p, and s is an 
estimate of c. Likewise, any statistic calculated on a sample is an estimate 
of a corresponding population parameter. In most situations the parameters 
are unknown and must be estimated from the sample data. 

The body of statistical methodology concerned with the problem of making 
statements about population parameters from sample values is called sam- 
pling statistics, and the logical process used is called statistical inference, this 
being a rigorous form of inductive inference. 

Tn drawing inferences about the characteristics of populations from sample 
values, it is assumed that the members of the sample are drawn at random 
from the population. The word “random” may be used in at least three 
ways. It may be used to refer to our subjective feelings that certain events 
are haphazard or completely lacking in order. It may be used in a theoretical 
sense with reference to the equiprobability of events. Thus a random sample 
is one drawn in such a fashion that every member of the population has an 
equal probability of being included init. When used in this way the meaning 
of the word is assigned to it within the framework of probability theory. Тһе 
word random is also used in an operational sense to describe certain operations 
or methods. Thus the drawing of numbers from a hat after they have been 
thoroughly mixed, or the drawing of cards from a deck after they have been 
shuffled, or certain techniques used in sweepstakes and lotteries are examples 
of random operations or methods. Sampling theory in statistics is based on 
the theoretical use of the word random; that is, on the idea of the equiproba- 
bility of each population member being included in the sample. The con- 
sequences of this theoretical approach may be tested by experiment. 

112 


Sec. 9.2] Essenlial Ideas of Sampling 113 
9.2. Sampling Errors and Sampling Distributions 


Let us now consider the nature of the error associated with particular 
sample values. What precisely is a sampling error? A sampling error is a 
difference between a population value, or parameter, and a particular sample 
value. Thus, if д is the population value of the mean and X; is an estimate 
based on a random sample of size N, then the difference и — X, = e; where 
e; is a sampling error. The concept of error in any context always implies a 
parametric, true, fixed, or standard value from which a given observed value 
may depart in greater or less degree. The idea that something in the nature 
of a parametric or true value can meaningfully be defined is of the essence of 
the concept of error. Without some appropriate definition of such a value, 
the concept of error has no meaning and no theory of errors is possible. 
Also no science is possible. 

How may the magnitude of error be estimated and described? Common 
sense suggests that in the measurement of any quantity some appreciation of 
the magnitude of error may be obtained by repeating the measurements a 
number of times and observing how these repeated measurements vary from 
each other. Thusin the measurement of the length of a bar of metal a series 
of separate measurements may be made under constant conditions. In this 
case each measurement is an estimate of the same “true” length; hence the 
variation observed with repeated measurement is due to error. To describe 
the magnitude of the error the standard deviation of the repeated measure- 
ments may be used. Thus in this situation the standard deviation becomes a 
measure of the magnitude of error. 

The above example is concerned with errors associated with particular 
observations, namely, measurements of a bar of metal. In considering the 
magnitude of error associated with, say, a sample mean X;, as an estimate of a 
population mean и, the situation is similar. The problem may be approached 
experimentally. А number, say, №, of samples of size N may be drawn at 
random from the same population and a mean calculated for each sample. 
These means may be represented by the symbols X1, Хз, Yves 
Each mean is an estimate of the same population mean p. The difference 
between the population mean и and any particular sample mean £; is a 
sampling error е. Thus 


w- Ху= е 
и— Xe = ез 
p- Хз = ез 


114 Statistical Analysis in Psychology and Educalion [Cmar. 9 


The standard deviation of the & sample means may be calculated. This 
standard deviation is a measure of the magnitude of the error associated with 
X; as an estimate of u. It is descriptive of the accuracy of the sample value as 
an estimate of the population parameter. Note that the standard deviation 
of the sample means X; is the same as the standard deviation of the sampling 
errors e;, because the population mean и acts as a constant. 

In the above discussion the problem of estimating error has been approached 
experimentally; that is, we considered the actual drawing of a number of 
samples and approached the experimental study of error through observed 
sample-to-sample fluctuation. The problem of estimating error can be 
approached theoretically using the theory of probability. Estimates of error 
may be made without the drawing of repeated samples. Consider for 
illustrative purposes a small finite population of eight members. Let the 
members of the population be cards numbered from 1 to 8. These cards may 
be shuffled, a sample of four cards drawn without replacement, and a mean 
calculated for the sample. This procedure may be repeated 100 times, and a 
frequency distribution made of the 100 sample means. This distribution is 
an experimental sampling distribution, and its standard deviation is a measure 
of the fluctuation in means from sample to sample. A theoretical as distinct 
from an experimental approach may be used. Given a finite population of 
eight members, a limited number of different samples of four cards exist. 
The number of such samples is the number of combinations of eight things 
taken four at a time, or C4* = 70. Each of these 70 samples may be con- 
sidered equiprobable. The means for the 70 possible samples may be 
ascertained, and a frequency distribution prepared. This frequency dis- 
tribution is a theoretical sampling distribution. It is obtained by direct 
reference to probability considerations. No drawing of actual samples is 
involved. The standard deviation of the theoretical sampling distribution 
is a measure of fluctuation in means from sample to sample. 

In the above example the population is small and finite. In practice most 
of the populations with which we deal are indefinitely large, or if finite, they 
are so large that for all practical purposes they can be considered indefinitely 
large. In the study of sampling error the approach used in dealing with an 
indefinitely large population is a simple extension of that used with a small 
finite population. The distinction between an experimental and theoretical 
sampling distribution still applies. Where the population is indefinitely 
large the theoretical sampling distribution of, for example, the mean is the 
frequency distribution of means of the indefinitely large number of samples 
of size N which, theoretically, could be drawn. 

The theoretical sampling distributions are known for all commonly used 
statistics. The standard deviation of the sampling distribution is called the 
standard error. Thus a standard error is always a standard deviation which 
describes the variability of a statistic over repeated sampling. The standard 


Sec. 9.3] Essenlial Ideas of Sampling 115 


deviation of a theoretical sampling distribution is in effect a population 
parameter. It is descriptive of the variation of a statistic in a complete 
population of sample values. The standard deviation of the theoretical 
sampling distribution of the mean is represented by the symbol oz. In 
practice this standard deviation must be estimated from sample data. This 
estimate in the case of the mean may be represented by the symbol з. For 
most statistics fairly simple formulas are available for estimating the standard 
deviation of the theoretical sampling distribution. 

The theoretical sampling distributions of some statistics are normal, or 
approximately so; others are not. For example, the theoretical sampling 
distribution of the mean X in sampling from an indefinitely large normally 
distributed population is normally distributed. The sampling distribution 
of the correlation coefficient presents a complicated problem. It is not 
normally distributed except under certain special circumstances. Where 
the shape of the sampling distribution is known, certain kinds of statements 
‘can be made about a population value from a sample estimate. For example, 
it is possible to fix limits above and below a sample value and assert with a 
known degree of confidence that the population parameter falls within those 
limits. The fixing of such limits requires a knowledge of the shape of the 
sampling distribution. Such limits are known as confidence limits and are 
discussed in detail later in this chapter. 


9.3. Sampling Distribution of Means from a Finite Population 


In practice, most samples are viewed as drawn from indefinitely large 
populations. The essential ideas of sampling may, however, be conveniently 
illustrated with reference to a small finite population. Suppose, as mentioned 
above, that we have a population of eight cards numbered from 1 to 8. These 
cards may be shuffled, and a sample of four cards drawn at random. After 
each card is drawn it is not returned; that is, the sampling is without replace- 
ment. A mean X may be calculated for this sample. The four cards may 
now be returned, the eight cards shuffled, another sample of four cards drawn, 
and another mean calculated. Let us continue this procedure until 100 
samples of four cards have been drawn and their means calculated. Table 
9.1, col. 3, shows the frequency distribution of 100 such sample means. This 
distribution is an experimental sampling distribution of means. It shows 
experimentally how the means of samples of four drawn at random without 
replacement from a population of eight vary from sample to sample. The 
mean of the experimental sampling distribution Xs, that is, the mean of the 
100 means based on samples of four, is found to be 4.56. The mean of 
the population from which the samples have been drawn is the mean of the 
integers from 1 to 8 and is 4.50. The standard deviation of the 100 means 


ss is found to be .826. 


116 Statistical Analysis in Psychology and Education — |Снлр.9 


The investigation of the fluctuation in sample means may be approached 
theoretically. The number of different samples of four in sampling without 
replacement from a population of eight members is the number of combina- 
tions of eight things taken four at a time, ог С; = 70. These 70 samples 
may be considered equiprobable. A listing of the 70 samples may readily 
be made, and the means calculated. The sample with the smallest mean will 
be 1, 2, 3, 4; the mean X = 2.50. The sample with the largest mean will be 


TABLE 9.1 


EXPERIMENTAL AND THEORETICAL SAMPLING DISTRIBUTIONS OF MEANS OF 
SAMPLES OF Four DRAWN FROM A POPULATION OF EIGHT MEMBERS 


Experimental "Theoretical 

distribution distribution 
zX £ s 

f b J b 
(1) (2) (3) (4) (5) (6) 
10 2.50 1 ‚010 1 014 
11 2.75 2 ‚020 1 014 
12 3.00 0 .000 2 .029 
13 3.25 5 .050 3 043 
14 3.50 T .070 5 .071 
15 3.75 7 .070 5 ‚071 
16 4.00 8 ‚080 7 ‚100 
17 4.25 11 110 7 100 
18 4.50 13 ‚130 8 114 
19 4.75 10 .100 7 .100 
20 5.00 10 .100 7 .100 
21 5.25 9 ‚090 5 ‚071 
22 5.50 7 ‚070 5 071 
23 5:15 4 ‚040 3 043 
24 6.00 3 ‚030 2 ‚029 
25 6.25 1 .010 1 014 
26 | 6. 50 t4 020 1 014 

Total....| . 100 | т Ф P 


5,6, 7, 8; here £ = 6.50. Thus X will range from 2.50 to 6.50. Table 9.1, 
col. 5, shows the frequency distribution of the 70 sample means. This 
distribution is a theoretical sampling distributfon of the mean of samples of 
four from a small finite population of eight members. It is based on the idea 
that there are 70 possible combinations of eight things taken four at a time, 
all combinations being equiprobable. 

The mean of the theoretical sampling distribution may be calculated. 


Sec. 9.3] Essenlial Ideas of Sampling 117 


This mean yu; is found to be 4.50. The standard deviation c; is found to be 
.866. "These values do not differ markedly from the mean and standard 
deviation of the experimental sampling distribution, these being X, — 4.56 
and s; — .826. Presumably, had a larger number of samples been drawn, 
say, 200 or 1,000, the experimental sampling distribution would be observed 
to approximate more closely to the theoretical distribution. 

The mean and standard deviation of the theoretical sampling distribution 
of Table 9.1 were calculated directly from the 70 possible sample means. 
These values may, however, be readily obtained without using this time- 
consuming method. It can be shown that the mean of the theoretical sam- 
pling distribution is equal to the population mean; that is, uz = д. In our 
example the mean of the sampling distribution of samples of four from a 
population of eight members is observed to be 4.50, Likewise, the popula- 
tion mean, that is, the mean of the integers from 1 to 8, is also 4.50. The 
standard deviation of the theoretical sampling distribution is given by the 


formula 
AE. N,—JN 
92 = VN Wat (9.1) 


where с = standard deviation in population 
N, = number of members in population 
N = sample size 
In our example с is the standard deviation of the integers from 1 to 8 and is 
equal to 2.29, Population and sample size are, respectively, 8 and 4. Hence 


239 |8—4 
Oz = UN EE = .866 


If, then, the standard deviation с of the population is known, we can readily 
obtain from the above formula the standard deviation of the theoretical 
sampling distribution and use this as a measure of fluctuation in means from 
sample to sample. 

A knowledge of the standard deviation of a theoretical sampling distribu- 
tion is of limited usefulness unless additional information is available on the 
shape of the distribution. In certain instances sampling distributions are 
normal, or approximately normal in form. The theoretical sampling dis- 
tribution of Table 9.1 departs appreciably from the normal form. 1f, how- 
ever, both sample and population size were increased, the distribution would 
approximate more closely to the normal form. For № = 30 and N, = 100, 
the normal distribution would be a good approximate fit. If the sampling 
distribution is approximately normal, we can, given its standard deviation, 
readily estimate the probability of obtaining values equal to or greater than 
any given size in random sampling from the population. 


118 Statistical Analysis in Psychology and Education : [Cnar. 9 — 


9.4. Sampling Distribution of Means from an Indefinitely 
Large Population 


Many populations may be conceptualized as comprised of an indefinitely 
large number of members. Most applications of sampling theory encoun- 
tered in psychology and education assume such populations. Sampling from 
an indefinitely large population is essentially the same as sampling from a 
finite population with replacement, that is, where each sample member is 
returned to the population prior to the drawing of the next member. In 
sampling without replacement from an indefinitely large population the 
probabilities remain unchanged regardless of the size of the sample drawn. 
Similarly, in sampling from a finite population with replacement, the popula- 
tion is not depleted and the probabilities are unchanged by the number of 
prior draws. It follows that problems of sampling from an indefinitely large 
population can be approached through the study of finite populations where 
samples are drawn with replacement. 


TABLE 9.2* 
POPULATION FROM WHICH SAMPLES WERE Drawn: FREQUENCY DISTRIBUTION 
Or NUMBERS 
Number Frequency Number Frequency 
1 1 14 174 
2 2 15 154 
3 4 16 127 
4 7 17 96 
5 4 18 67 
6 26 19 43 
7 43 20 26 
8 67 21 14 
9 96 22 7 
10 127 23 4 
11 154 24 2 
12 174 25 1 
13 181 
NY E 


* Tables 9.2 to 9.4 are reproduced from R. W. B. Jackson and George A. Ferguson, 
Manual of educational statistics, University of Toronto, Department of Educational 
Research, Toronto, 1942. 


To illustrate sampling from an indefinitely large population, an artificial 
population was constructed. This population’ was comprised of 1,611 cards 
containing the numbers from 1 to 25. The distribution of numbers is 
approximately normally distributed. The distribution of this population is 
shown in Table 9.2. Тһе mean и of the population is 13, and the standard 
deviation с is 3.56. The cards were inserted in a box, and samples of 10 


| 


Sec. 9.4] Essential Ideas of Sampling 119 


cards drawn with replacement; that is, a card was drawn, its number noted, 
and the card then returned to the box before the next draw. Altogether 
100 samples of 10 cards were drawn, and the 100 means calculated. Table 
9.3 shows the means of the samples. Table 9.4, col. 2, shows a frequency 
distribution of these means. ‘This distribution is an experimental sampling 
distribution of means based on samples of size 10. The mean of the sampling 


TABLE 9.3 
MEANS or SAMPLES OF 10 DRAWN FROM THE POPULATION IN TABLE 9.2 
10.9 13.5 11.7 13.3 13.8 12.5 15.0 12.7 14.3 12:7 
13.0 13.2 14.0 13.1 13.2 12.7 12.6 11.5 13.2 12.9 
12.4 13.9 14.1 12.2 13.1 11.7 11.5 14.6 12.6 12.9 
13.9 14.0 11.7 12.1 13.2 13.6 14.4 14.0 12.2 13.7 
12.6 Иб 118 12.1 13.1 13.2 12.5 14.0 16.4 12.2 
12.6 13.7 13.6 14.0 12.1 13.2 14.8 13.6 12.5 14.5 
14.4 13.9 13.8 15.1 14.2 14.4 13.5 12.7 14.5 14.4 
12.9 11.3 14.5 13.0 12.0 13.3 12.7 14.8 11.3 11.0 
12.7 14.6 15.2 14.1 16.1 14.7 12.3 11.2 14.3 14.7 
12.9 12.3 11.9 14.0 14.5 12.4 11.9 12.3 12.4 12.6 


TABLE 9.4 
EXPERIMENTAL AND THEORETICAL SAMPLING DisTRIBUTION OF 100 SAMPLE MEANS 
FOR SAMPLES OF SizE 10 DRAWN FROM THE POPULATION OF TABLE 9.2 


Frequency 


Class interval 
Experimental | Theoretical 


16.5-17.4 
15.5-16.4 
14.5-15.4 
13.5-14.4 
12.5-13.4 


3 NGON 
PIN مر من > ج‎ 


Cle t0 оњ і ا‎ 


distribution X, is 13.205, and the standard deviation s; is 1.128. This 
standard deviation is a description based on experimental data of the sample- 
to-sample fluctuation of means of samples of size 10 drawn at random from 
this population. 

'The mean and standard deviation of the sampling distribution need not be 
estimated by the rather laborious experimental approach described above. 
It can be shown that the mean of the theoretical sampling distribution of the 
mean in sampling from an indefinitely large population is equal to the popula- 
tion mean; that is, д, = и. It can also be shown that the standard deviation 


120 Stalistical Analysis in Psychology and Educalion — (Cnr. 9 


of the sampling distribution is given by 


с 
в = TN (9.2) 
where с = standard deviation in population 
N = size of sample 

The reader will observe that the difference between this formula and the 
formula previously given for the standard deviation of the sampling dis- 
tribution for the means of samples from a finite population resides in the 
absence here of the term /(N, — N)/(N, — 1). As N, increases, this term 
approaches 1 asa limit. It is equal to 1 when the population is indefinitely 
large. The standard deviation of the theoretical sampling distribution is 
сз = 3.56/ 4/10 = 1.126. This is very close to the standard deviation of the 
experimental sampling distribution s+ which was found to be 1.128. 

The theoretical sampling distribution of the means of samples drawn from 
a normal population is normal. Thus if we know that the population dis- 
tribution is normal, we know that the sampling distribution of means is 
normal. Regardless of the shape of the population distribution, the sampling 
distribution of means will approximate the normal form as № increases in 
size. For practical purposes the distribution may be taken as approxi- 
mately normal for samples of reasonable size, except in the case of fairly gross 
departures of the population from normality. The theoretical normal 
frequencies have been calculated for our illustrative example. These 
theoretical frequencies are shown in Table 9.4, col. 3. These are the expected 
normal frequencies for a normal curve with a mean of 13.00 and a standard 
deviation of 1.126. The differences between the experimental and the 
theoretical normal distribution are not very great. 

Examination of the formula о; = c/4//N indicates that the standard error 
of the mean is directly related to the standard deviation of the population and 
inversely related to the size of sample. Thus the greater the variation of the 
variable in the population, the greater the standard error; also the larger the 
size of N, the smaller the standard error. The standard error of means of 
samples of N — 1, the smallest sample size possible, is equal to the population 
standard deviation. For any fixed value of е the standard error can be made 
as small as we like by increasing the size of the sample. 


9.5. Confidence Intervals of Means for Large Samples 


In the above discussion we considered the mean and standard deviation of 
the distribution of sample means where the parameters of the population 
were known. In most practical situations the parameters are not known and 
we have to consider the problem of drawing inferences about the population 
{тот sample data alone. Thus given a known sample mean X, for exam- 


Src. 9.5] Essenlial Ideas of Sampling 121 


ple, what kind of statement can be made about the unknown population 
mean и? 

An approach to this problem is to specify an interval within which we may 
assert with some known degree of confidence that the population mean lies. 
Thus a mean based on a sample of № observations may be 26.88. We may 
perform a simple calculation, which will shortly be described, and assert 
with 95 per cent confidence that the population mean falls within the limits 
24.92 and 28.84. These values are called confidence limits, and the interval 
they contain is called a confidence interval. The mean, X = 26.88, is called a 
point estimate; the interval 24.92 to 28.84 constitutes an interval estimate. 

What meaning attaches to the statement that we are 95 per cent confident 
that the actual population mean falls within certain specified limits? Were 
we to draw another sample of the same size, the mean may be found to be 
25.68 and the 95 per cent confidence intervals calculated as 23.72 and 27.64. 
Presumably we could draw a large number of samples, obtain a large number 
of upper and lower limits, and prepare frequency distributions of these upper 
and lowerlimits. These two distributions would be experimental sampling 
distributions for the 95 per cent confidence limits. Without elaborating the 
details of this situation we state that about 95 per cent of the intervals so 
obtained would include the population mean and about 5 per cent of the 
intervals would not include the population mean. Thus the statement that 
we are 95 per cent confident implies that we expect about 95 per cent of our 
assertions to be correct and the remaining 5 per cent to be incorrect, or that 
the odds are 19:1 that the confidence interval includes the population value. 

The use of a 95 per cent confidence interval is fairly common. Ifa greater 
degree of confidence is desired, a 99 per cent interval may be used. This 
interval will, of course, be greater than the 95 per cent interval, very roughly, 
1.3 times as great. Thus as we increase our level of confidence the interval 
is increased. Likewise, of course, as we decrease the level of confidence the 
interval is decreased. Any desired level of confidence can be obtained by 
varying the size of the confidence interval. As the confidence level is 
decreased and approaches zero, the confidence interval approaches zero as à 
limit. As the confidence level is increased and approaches 100, the confidence 
interval approaches infinity as a limit. In practice, 95 and 99 per cent 
confidence intervals are widely used. 

The calculation of confidence intervals for a mean based on a large sample 
is a relatively simple procedure. The standard deviation of the sampling 
distribution of the mean, the standard error, is, as previously stated, given 
by c, = c/A/N. The population standard deviation « is unknown. If we 
use the sample standard deviation s as an estimate of т, we obtain an estimate 
of the standard error. This estimate of the standard error is given by 


s= V (9.3) 


122 Slalislical Analysis in Psychology and Education — [Cur. 9 


If the sample is large, the 95 per cent confidence limits are given by Х + 
1.965;; thus the upper limit is 1.96 standard error units above the sample 
mean and the lower limit is 1.96 standard error units below the sample mean. 
The figure 1.96 derives, as the reader will recall, from the fact that 95 per cent 
of the area of the normal curve falls within the limits +1.96 standard 
deviation units from the mean. To illustrate the fixing of confidence 
intervals, let the mean IQ of a random sample of 100 secondary school 
children be 114 and the standard deviation 17. Our estimate of the standard 
error of the mean is s; = 17/4/100 = 1.70. The 95 per cent confidence 
interval is then given by 114 + 1.96 X 1.70. The upper limit is 117.33, and 
the lower limit is 110.67. Thus we may assert with 95 per cent confidence 
that the population mean falls within these limits. The 99 per cent con- 
fidence limits are given by X + 2.58s;. The figure 2.58 derives from the 
fact that 99 per cent of the area of the normal curve falls within the limits 
+ 2.58 standard deviation units above and below the mean. In the above 
example the 99 per cent confidence limits are given by 114 + 2.58 X 1.70. 
These limits are 109.61 and 118.39, 

Implicit in the above discussion is the assumption that the quantity 
в = (X — u)/s; is normally distributed. This ratio is a deviation of a 
sample mean from its population mean, divided by an estimate of the stand- 
ard deviation of the sampling distribution. It is a standard score. "This 
ratio is not normally distributed when № is small, but approaches the normal 
form as N increases in size. It is a common statistical convention to consider 
a sample of 30 or more observations as large and a sample of less than 30 as 
small. This of course is highly arbitrary. It results from the fact that 
where N is about 30 the differences between the shape of the distribution of 
the quantity (X — 4)/s; and the normal distribution of unit area and unit 
standard deviation are so small that for most practical purposes they may be 
ignored. 


9.6. Unbiased and Biased Estimates 


Some sample statistics are spoken of as unbiased estimates of population 
values; others are biased. A sample statistic is an unbiased estimate when 
the mean of a large number of sample values, obtained by repeated sampling, 
approaches the population value as a limit. This simply means that a 
statistic is unbiased when it displays no systematic tendency to be either 
greater than or less than the population value. The sample value of the 
arithmetic mean is an unbiased estimate. The sample mean X exhibits no 
systematic tendency to be either greater than or less than the population 
mean д. Stated in somewhat different language, an estimate is unbiased 
when its expected value is equal to the population parameter it purports to 
estimate. The expected value of a statistic is the value we should expect 


Sec. 9.7] Essential Ideas of Sampling 123 


‚ to obtain on averaging the statistic over an indefinitely large number of 
repeated random samples, It is the mean of the theoretical sampling dis- 
° tribution. The expected value of the mean, denoted by E(X), is the popu- 
lation mean p. A statistic is a biased estimate when the mean of repeated 
sample estimates does not tend toward the population value but departs on 
the average in some systematic fashion from it. The sample value of the 
variance s?, calculated by the formula У(Х — X)?/N, is a biased estimate. 
The expected value of the variance E(s?) is not equal to the population 
variance c?. The estimate s? exhibits a systematic tendency to be less than 
the population parameter o*. An unbiased estimate of the variance о? is 
given by 
N : 
ad Xs (9.4) 
The extent of the bias is represented by the ratio N/(N — 1). If N is small, 
the difference between the biased and unbiased estimate of c? may be 
appreciable. If, for example, s? = 400 and N = 5, the unbiased estimate of 
o? is 500, a substantial difference. The extent of the bias decreases as № 
increases in size. For large N the bias is trivial. 
If we divide the sum of squares of deviations from the mean by N — 1, 
instead of N, we obtain an unbiased estimate. Thus 
7 © 2 М 2 
N М (х 2): zx P (9.5) 


ae ih eee 

N-1 N=1 N N- 
The formula for the sampling variance of the arithmetic mean с = g°/N 
requires an estimate of о. If we use s*, the biased estimate obtained by 
dividing the sum of squares by N, the estimate of the error variance is PIN: 
The estimate of the standard error is s/1/N. If an unbiased estimate of о? 
is used, the estimate of the error variance of the mean is s?/(N — 1) and the 
standard error becomes s/y N — 1. 

Hitherto the symbol s? has been used to denote the sample variance 
obtained by dividing the sum of squares of deviations about the mean by V 
and о? to denote the population variance. In subsequent chapters the 
symbol s? will also be used to denote the unbiased variance estimate. In 
every instance we indicate clearly whether s? refers to a biased or unbiased 
estimate. Some authors define the variance initially by dividing the sum of 
squares of deviations about the mean by N — 1. There are both advantages 
and disadvantages to this procedure. 


9.7. Degrees of Freedom 


The use of N — 1 instead of N to obtain an unbiased estimate of the 
population standard deviation involves a concept of some importance in 


124 Slalislical Analysis in Psychology and Education (Снар. 9 


statistics. While N is the number of observations in the sample, N — 1 is 
the number of degrees of freedom. The number of degrees of freedom is the 
number of values of the variable that are free to vary. Consider the five 
measurements 10, 14, 6, 5, and 5. Represented as deviations from the mean 
of 8, these measurements become +2, +6, —2, —3, —3. Thesum of devia- 
tions about the mean is zero. In consequence, if any four of these deviations 
are known, the remaining deviation is determined. Thus four of the devia- 
tions are free to vary independently, and the number of degrees of freedom is 
№ — 1, or 4. 

This type of situation may be represented in symbolic form. Let X3 
Xs, Хз be three measurements with mean X. The sum of deviations is 
(X,— £) + (X&— X) + (35 — X) =0. If X and any two of the values 
of X are known, the third value of X is determined. The number of degrees 
of freedom hereis2. The calculation of the variance and standard deviation 
requires the sum of squares of deviations about the mean, У(Х — X)*. 
N — 1of the values of which this sum of squares is comprised are free to vary 
independently. The number of degrees of freedom associated with the sum 
of squares is N — 1. Dividing this sum of squares by the number of degrees 
of freedom associated with it, as distinct from the number of observations, 
yields an unbiased estimate of the population variance o. The symbol df is 
frequently used to represent degrees of freedom. 

The number of degrees of freedom depends on the nature of the problem. 
In fitting a line to a series of points by the method of least squares the number 
of degrees of freedom associated with the sum of squares of deviations about 
the line is № — 2. If there are two points only, a straight line will fit the 
points exactly and the sum of squares of deviations about the line will, of 
course, be zero. No freedom of variation is possible. With three points 
df = 1; with 15 points, df = 13. The equation of a straight line is 


Y=6X+a 


where b is the slope of the line and a is the point where it cuts the Y axis. 
Both b and а are estimated from the data. It may be said that 2 degrees of 
freedom are lost in estimating b and a from the data. Ifb,a,andany N — 2 
deviations from the line are known, the remaining two deviations are 
determined. 

The concept of degrees of freedom has a geometric interpretation. A point 
on a line is free to move in one dimension only and has 1 degree of freedom. 
A point on a plane has freedom of movement in two dimensions and has 2 
degrees of freedom. А point in a space of three dimensions has 3 degrees of 
freedom. Likewise, a point in a space of k dimensions has Ё degrees of 
freedom. It has freedom of movement in & dimensions. 

The concept of degrees of freedom is widely used in statistical work and will 
be discussed subsequently in connection with contingency tables and the 


Sec. 9.8] Essential Ideas of Sampling 125 


analysis of variance. The essence of the idea is simple. The number of 
degrees of freedom is always the number of values that are free to vary, given 
the number of restrictions imposed upon the data. It seems intuitively 
obvious that in the study of variation we should concern ourselves with the 
number of values that enjoy freedom to vary within the restrictions of the 
problem situation. 


9.8. The Distribution of t 


In drawing samples from a normal population with mean и and variance 
с, the distribution of the ratio 
X-u 


о 


is normal. This ratio is in standard-score form with zero mean and unit 
standard deviation. It is a deviation of a sample mean from a pópulation 
mean, divided by the standard deviation of the sampling distribution of 
means. Wherec? is unknown, we estimate it from the sample data, using in 
this instance an unbiased estimate. We obtain thereby an estimate of сз. 
Denote this by sz. We may now consider the ratio 


"ec iur ae ial M (9.6) 
* Бох Ху 
Ww — 1) 


This ratio contains the variable sample values X and s; in the numerator and 
denominator, respectively. This is a ¢ ratio. It departs appreciably from 
the normal form for small N. Its theoretical sampling distribution is called 
the distribution of 4. If samples of, say, 5 or 10 members are drawn from a 
normal population, a value of / calculated for each sample, and a frequency 
distribution of the different values of / prepared, the resulting distribution 
will not be normally distributed. It will be symmetrical but leptokurtic. 
The theoretical sampling distribution of / for small N is also symmetrical and 
leptokurtic. It tapers off to infinity at the two extremities. It is, however, 
thicker at the extremities than the corresponding normal curve. A different 
1 distribution exists for each number of degrees of freedom. As the number 
of degrees of freedom increases, the / distribution approaches the normal form. 
Figure 9.1 compares the normal distribution with the distribution of £ for 
various degrees of freedom. 

Hitherto we have considered two theoretical model frequency distributions, 
the binomial distribution and the normal distribution. The / distribution is 
a third theoretical model distribution with wide application to many sampling 
problems. It was developed originally in 1908 by W. S. Gosset who wrote 
under the pen name “Student.” 


126 Slalislical Analysis in Psychology and Education [Crar. 9 


In sampling problems the ¢ distribution is used in a manner directly 
analogous to the normal distribution. In the normal distribution 95 per cent 
of the total area under the curve falls within plus and minus 1.96 standard 
deviation units from the mean and 5 per cent of the area falls outside these 
limits. Likewise, 99 per cent of the area under the normal curve falls within 
plus and minus 2.58 standard deviation units from the mean and 1 per cent 
of the area falls outside these limits. In the / distribution, the distances 
along the base line of the curve that include 95 per cent and 99 per cent of the 
total area are different for different numbers of degrees of freedom. It is 
customary in tabulating areas under the / curve to use degrees of freedom, df, 
instead of N. While the df associated with the sample variance is N — 1, 
the df associated with other statistics may be N — 2, N — 3, and the like. 
Consequently, tables of / by degrees of freedom instead of N are more 
generally applicable. "The distances from the mean, measured along the base 
line of the / distribution, that include 95 per cent and 99 per cent of the total 
area (analogous to the 1.96 and 2.58 of the normal distribution) for selected 
degrees of freedom, are as follows: 


df | 95% | 99% 
1 | 12.71 | 63.66 
2| 4.30) 9.93 
3 | 3.18 | 5.84 
4| 2.78 | 4.60 
5 | 2.57 | 4.03 
10 | 2.23 | 3.17 

15 | 2.13 | 2.95 

20 | 2.09 | 2.85 

30 | 2.04 | 2.75 

120 | 1.98 | 2.62 
© 1.96 | 2.58 


Note that as the number of degrees of freedom approaches infinity, / 
approaches the values 1.96 and 2.58. The difference between / for about 30 
degrees of freedom and / for an indefinitely large number of degrees of 
freedom is sometimes interpreted for practical purposes as trivial. A more 
complete tabulation of ¢ is given in Table B of the Appendix. А distinction 
is often made between large and small sample statistics. This distinction 
resides in the fact that the normal distribution is frequently found to be an 
appropriate model for use with sampling problems involving large samples. 
With small samples the distribution of ¢ provides for many statistics a more 
appropriate model. 


Sec. 9.10] Essenlial Ideas of Sampling 127 


f(t), relative frequency 


-4 -3 =2 E 0 +1 E E E 

t " 
Fic. 9.1. Distribution of / for various degrees of freedom. (From D. Lewis, Quantitative 
methods in psychology, Iowa City, Iowa. Published by the author, 1948.) 


9.9. Confidence Intervals of Means for Small Samples 


The line of reasoning used in determining confidence intervals for small 
samples is similar to that for large samples. With small samples, however, 
the distribution of 2 is used instead of the normal distribution in fixing the 
limits of the interval. For large samples the 95 and 99 per cent confi- 
dence intervals for the mean are given, respectively, by X + 1.96 s; and 
X + 2.58s, For small samples an unbiased estimate of о? is used in esti- 
mating the standard error. The value of / used in fixing the limits of the 
95 and 99 per cent intervals will vary, depending on the number of degrees of 
freedom. Consider an example where X = 24.26, s? = 64, N = 16, and 
df = 16 — 1. On reference to Table B of the Appendix we observe that for 15 
degrees of freedom 95 per cent of the area of the distribution falls within a ¢ of 
+ 2.13 from the mean. The standard error using the unbiased variance esti- 
mate is 8/4/15. The 95 per cent confidence limits are given by 24.26 + 
2.13 X 8/4/15. These limits are 19.88 and 28.64. We may assert with 
95 per cent confidence that the population mean falls within these limits. 
The 99 per cent limits are given by 24.26 + 2.95 X 8/4/15. These limits 
are 18.16 and 30.36. 


9.10. Standard Error and Confidence Intervals of Proportions 


Many problems require the use of proportions and percentages. А per- 
centage is a proportion multiplied by 100. The sampling distribution of a 
proportion is approached through the binomial. То illustrate, consider a 
barrel containing a number of black and white chips. Denote the proportion 


128 Statistical Analysis in Psychology and Education [Снар. 9 


of black and white chips by 0 and 1 — 6, respectively. Let us draw a large 
number of samples of size N at random with replacement from the barrel, 
observe the proportion of black chips in each sample, and make a frequency 
distribution of these proportions. This frequency distribution is an experi- 
mental sampling distribution of proportions for samples of size N from a 
population where the population proportion is 0. The expected, or theoreti- 
cal, distribution of the number, as distinct from the proportion, of black chips 
in the samples is given by expanding the binomial [(1 — 0) + OU, The 
mean and standard deviation of this distribution аге № and A/N8(1 — 0), 
respectively. To obtain the standard deviation of the expected distribution 
of the proportion of black chips in samples of size N, as distinct from the 
number, we divide »/N@(1 — 0) by N and obtain 


_ [Ki — 8) 
= + ын (9.7) 


This is the standard error of a proportion. To illustrate, let 0 = .25 and 
1— 0= .75. The expected distribution of the number of black chips in 
samples of size 10 is given by expanding the binomial (.25 + .75)!%. The 
mean in this example is 10 X .25 = 2.5, and the standard deviation is 
+/10 X .25 X .75 = 1.37. The standard deviation of the expected dis- 
tribution of the proportion of black chips in samples of size 10 is obtained by 
dividing 1.37 by 10 and is .137. This assumes that 0 is known. In practice, 
0 is not known and the sample value фр is used as an estimate of 0. The 
estimate of the standard error of a proportion is then given by 


„= 070 - н (9.8) 


where 1 — р = q. Also, it may be readily shown that the standard error of a 
per cent is given by 


sp — 100 {үй (9.9) 


If it can be assumed that the sampling distribution of a proportion can be 
approximately represented by a normal distribution, then the 95 and 99 per 
cent confidence limits for a proportion are given by p + 1.965, and p + 2.585, 
respectively. Whether or not the sampling distribution can be represented 
by a normal distribution depends both on the size of the sample and on the 
value of p. For any given value of N the sampling distribution of a propor- 
tion becomes increasingly skewed as р and д depart from .50. Quite clearly, 
the formula for the standard error of a proportion should not be used with 
reference to a normal curve for extreme values of p and q. It has been 


Sec. 9.11] Essenlial Ideas of Sampling 129 


suggested that the formula for the standard error of a proportion should 
be used only where Np, or Ng, whichever is the smaller, is equal to or greater 
than 5. Thus where р = .10 and N = 20, Np = 2. The use of the formula 
Sp = /pq/N would be considered inappropriate here. Where р = .10 and 
N = 100, Np = 10. Presumably, here the differences between the binomial 
and the normal distribution are quite small and can safely be ignored. 


9.11. Standard Errors and Confidence Intervals of Other Statistics 


Where N is large the standard error of the median for samples of size № 
drawn from a normal population with standard deviation е is given approxi- 
mately by 

1.2530 
бъда = —— ) 
e VN (9.10) 


The standard error of the median is about 1.25 times as large as the standard 
error of the mean. In consequence, the sample mean for normal populations 
is a more efficient estimate of the population mean than the sample median is 
of the population median. If the biased sample estimate of о? is used, the 
estimate of the standard error of the median becomes 


1.253s 
Sian = we (9.11) 


Confidence limits at the 95 and 99 per cent levels may be located by taking 
+ 1.9654, and + 2.5854 about the sample median. The above formulation 
assumes normality of the parent population and a large N. In many situa- 
tions where the median is used the distribution of the variable is not normal. 
This, indeed, is one of the reasons for using the median instead of the mean. 
In consequence the above formulation is of limited use. Confidence intervals 
for the median involving no assumptions about the shape of the distribution 
of the variable in the population, other than its continuity, have been worked 
out by Nair (1940). His method is described by Kenney and Keeping (1954) 
and Johnson (1949). Given N observations arranged in ascending order, 
Ху < Xa < +++ < Ху, the median is the middle value. The problem of 
fixing, say, 95 per cent limits is approached by locating two values of X; in 
this ascending series such that the probability that these values will include 
the population median is not less than .95. 

The standard error of the standard deviation for large samples from a 
normal population is given by 


(9.12) 


130 Stalistical Analysis in Psychology and Education (Сна. 9 


Where a biased estimate of o is used we obtain 


Ss = 


VAN (9.13) 


The 95 and 99 per cent confidence limits can readily be obtained by taking 
s + 1.965, and s + 2.58s,, respectively. In using this formula a large sample 
should be regarded as substantially greater than 30. The method of deter- 
mining confidence limits for s based on small samples, and indeed the method 
which is perhaps most appropriate in all cases regardless of size of №, involves 
a knowledge of the distribution of chi square, or x*. For a simple discussion 
of this method see Freund (1952) or Johnson (1949). The application of x? 
to a variety of statistical problems will be discussed in Chap. 12. 


EXERCISES 


1. Indicate the difference between (a) an experimental and a theoretical sampling distri- 
bution, (b) the theoretical and operational meanings of randomness, (с) biased and 
unbiased estimates, (d) finite and infinite populations, (e) large and small samples, 
(7) point and interval estimates, (д) N and df. 

. The standard deviation of the sampling distribution of X ise / N. What is the stand- 

ard deviation of the sampling distribution of NX or EX? 

Samples of three cards are drawn at random from a population of eight cards num- 

bered from 1 to 8. Obtain (a) the theoretical sampling distribution of means, (0) the 

standard deviation of the sampling distribution, (c) the probability of obtaining a mean 

equal to or greater than 7. 

4. Random samples of size 50 are drawn from a normal population with д = 40 апіс = 
14. Estimate what proportion of samples will have a mean (a) less than 38, (b) less 
than 36, (c) greater than 41, (d) between 36 and 44. 

5. A random sample of 400 observations has a mean of 50 and a standard deviation of 18. 

Estimate the 95 and 99 per cent confidence limits for the mean. 

How is the standard error of the mean affected by tripling sample size? 

Estimate for the following data the 95 and 99 per cent confidence intervals for means. 


t 


> 


TUE 


- N |x(x — £y 


(à) 26.2| 7 77.0 
(0) 58.3 | 11 249.0 
(с) 46.3| 25 | 1,525.0 
(d 84| 16 444.7 


8. What proportion of the area of the ¢ distribution falls (а) above # = 3.169 where df = 
10, (b) below ¢ = —1.725 where df = 20, (c) between t = 3.659 where df = 29, (d) 
between / = 2.131 and / = 2.602 where df = 15, (e) between ¢ = —4.541 and ¢ = 
3.182 where df — 3? 

9. Obtain the theoretical sampling distribution of proportions in drawing samples of 
eight from a population where the population value of the proportion is .60, What is 
the standard deviation of the sampling distribution? 

10. Estimate the 95 and 99 per cent confidence limits for р = .75 where N = 169. 


cHAPTER 10 


TESTS OF SIGNIFICANCE 


10.1. Introduction 


In Chap. 9 we considered the sampling error associated with single 
sample values. Sampling distributions, standard errors of single sample 
values, and confidence intervals were discussed. In practical statistical 
work in psychology we are infrequently concerned with simply describing the 
magnitude of error associated with single sample values. Experimental data 
very often require a comparison and evaluation of two or more means, pro- 
portions, standard deviations, or other statistics obtained from separate 
samples or from the same sample for measurements obtained under two or 
more experimental conditions. To illustrate, an investigator may wish to 
explore the effects of a tranquilizing drug on the estimation of time intervals 
as part of a study on time perception. He may administer a drug to an 
experimental group of subjects and a placebo, an inactive simulation of the 
drug, to a control group and measure the errors in time estimation made by 
the two groups. The mean error for the two groups may be calculated. The 
experiment requires an evaluation of the difference between these two means. 
Both means are subject to sampling error. May the difference between the 
two means be probably ascribed to sampling error, or may it be argued with 
confidence that the drug affects time perception? A decision is required 
between these alternatives. Statistical procedures which lead to decisions 
of this kind are known as tests of significance. 

Tests of significance may be applied to the difference between statistics 
calculated on independent samples or between statistics obtained under 
different conditions for the same sample. Sometimes a test of significance is 
applied to test the difference between a single sample statistic and a fixed 
value. An example is the procedure used to test whether a correlation 
coefficient is significantly different from zero. In this case the fixed value is 
zero. While many tests of significance involve a comparison of two sample 
statistics, or a single sample statistic and a fixed value, such tests can readily 
be extended to cover situations where more than two sample statistics are 
involved. For example, in the experiment mentioned above on the effects of 
a drug on time perception, the experiment could be designed to include the 
administration of the drug in different dosages to different groups of subjects. 

131 


132 Slalislical Analysis in Psychology and Education [Cuar. 10 


Three or four or five different dosages might be used, resulting in three or 
four or five different means. The means could be compared two at a time 
to ascertain whether or not the differences between them could be attributed 
to sampling error. А more efficient form of analysis, the analysis of variance, 
discussed in Chap. 15, provides a procedure for making an over-all test in 
this type of situation. 


10.2. The Null Hypothesis 


Consider an experiment using an experimental and a control group. A 
treatment is applied to the experimental group. The treatment is absent 
for the control group. Measurements are made on both groups. Presuma- 
bly any significant difference between the two groups can be ascribed with 
confidence to the treatment and to no other cause. Let X; and X; be the 
means for the experimental and the control group, respectively. Both means 
are subject to sampling error. The means X; and X; are estimates of the 
population means ш and и». The trial hypothesis may be formulated that 
no difference exists between шу and из. This hypothesis is a null hypothesis 
and may be written 


Неш — i = 0 (10.1) 


The symbol Н represents the null hypothesis. Very simply, this hypothesis 
asserts that no difference exists between the two population means. Note 
that the statement u; — us = 0 is the same as ду = иг. Thus an alter- 
native formulation of the hypothesis is to assert that the two samples are 
drawn from populations having the same mean. In general, regardless of 
the particular statistics used, the null hypothesis is a trial hypothesis asserting 
that no difference exists between population parameters. Thus a null 
hypothesis about two variances would take the form Zo:e;* — тз? = 0, or 
Ноа? = с’. 

The logical steps used by an investigator in applying a test of significance 
are these. First, he assumes the null hypothesis; that is, he operates on the 
trial hypothesis that the treatment applied will have no effect. Second, he 
examines the empirical data. Where the hypothesis pertains to two means 
he examines the difference between the two means, X, — Xs. Third, the 
question is asked, what is the probability of obtaining a difference equal to or 
greater than the one observed in drawing samples at random from populations 
where the null hypothesis is assumed to be true? In the case of two means, 
what is the probability of obtaining a difference equal to or greater than 
X, — X, in drawing random samples from populations where дү — ша = 0? 
Fourth, if this probability is small, the observed result being highly improbable 
on the basis of the null hypothesis, the investigator may be prepared to reject 
the null hypothesis. This means that the observed difference cannot 


Sec. 10.3] Tesls of Significance 133 


reasonably be explained by sampling error and presumably may be attributed 
to the treatment applied. Thus the result may be said to be significant. if 
this probability cannot be considered small and the observed result is not 
highly improbable, then sampling error may account for the difference 
observed. Hence we cannot with confidence infer that the difference results 
from the treatment applied. 

How small should the probability be of obtaining the difference observed, 
the null hypothesis being assumed, before we can reject the null hypothesis 
and regard the difference as significant? Here the statistician imposes a 
fairly rigorous standard. It is conventional to accept probabilities of either 
05 or .01 as standards of significance. If the probability is equal to or less 
than .05 that the difference observed could result from sampling error, 
then the difference is said to be significant at the .05, or 5 per cent, level or less. 
Here the chances are 5 in 100 or less that the difference could result when the 
treatment applied was having no effect. If the probability is .01 or less, the 
difference is said to be significant at the .01, or 1 per cent, level or better. 
The .05 and .01 probability levels are descriptive of our degree of confidence 
that a real difference exists, or that the observed difference is not due to the 
caprice of sampling. Usually in evaluating an experimental result it is 
unnecessary to determine the probabilities with a high degree of accuracy. 
For most practical purposes it is sufficient to designate the probability 
as p< .05, or р < .01, or possibly 2 < 001 if the result is highly 
significant. 

If the probability does not reach the level required for significance, can we 
regard the null hypothesis as true? The answer isno. We may fail to reject 
the null hypothesis, but this does not mean that the null hypothesis is true. 
Many other alternative hypotheses may be formulated which on the basis of 
the experimental evidence available also cannot be rejected. For example, 
in comparing two means, instead of the null hypothesis, Но: = из = 0, we 
may formulate the hypothesis Hı:pı — ps = .00001. In nearly all experi- 
mental situations the experimental data will allow no choice between two 
such hypotheses as these. Clearly, an indefinitely large number of alter- 
native hypotheses exist, in addition to the null hypothesis, which on the basis 
of any particular bit of experimental evidence cannot be rejected. Science 
proceeds by the conduct of experiments which enable the rejection of the null 
hypothesis at accepted levels of significance. To rigorously demonstrate 
the truth of the null hypothesis is a logical impossibility. 


10.3. Sampling Distribution of Differences 


A test of the significance of the difference between two sample statistics 
requires a knowledge of the sampling distribution of differences. Here we 
consider the distribution of the differences between two statistics, say, two 


134 Statistical Analysis in Psychology and Education (Снар. 10 


sample means, with repeated random sampling. Conceptualize two indefi- 
nitely large populations whose means are equal; that is, ш = ие. Let X; be 
the mean of a sample of N, cases drawn at random from the first population 
and Ñ, be the mean of a sample of Ns cases drawn from the second population. 
The difference between means is X, — Xs. Since ду = us this difference 
results from sampling error. A large number of pairs of samples may be 
drawn, and a frequency distribution made of the differences. It describes 
how the differences between means chosen at random from two populations, 
where д = us, will vary with repeated sampling. From this distribution 
we may estimate the probability of obtaining a difference of any specified 
size in drawing samples at random from populations where ш = из. Ву 
considering an indefinitely large number of pairs of samples we arrive at the 
concept of a theoretical sampling distribution of differences between sample 
means. In this situation the individual measurements in the two populations 
are not paired with one another. The samples are independent. The means 
may be viewed as paired at random. No correlation exists between the pairs 
of means. 

Consider now a situation where measurements are paired with one another. 
Such data arise, for example, where measurements are made on the same 
group of subjects under control and experimental conditions. The paired 
measurements may be correlated. In approaching the sampling distribution 
of differences between means in this instance we conceptualize two popula- 
tions of paired measurements with equal means; thus ш = ua. Denote the 
correlation between the paired measurements by the symbol руз. Samples of 
size N are drawn at random, and the differences between means obtained. 
The distribution of differences between means for an indefinitely large 
number of samples is the sampling distribution of differences for correlated 
populations. 

A clear distinction is made between tests of significance which are appropri- 
ate for independent samples, where no basis exists for pairing the observa- 
tions, and tests appropriate for correlated samples, where a basis exists for 
pairing the observations one with another. 

The variance of the sampling distribution of differences describes how the 
differences vary with repeated sampling. Consider the case of independent 
samples. If øz? = 012/1, is the variance of the sampling distribution of 
means drawn from one population and es, = os*/N» the corresponding 
variance from the other population, then the variance of the sampling 
distribution of differences between means is the sum of the two variances. 
Thus 


2 2 
а, = 92 tone = LA + T (10.2) 


When 7? = oy” = o°, the variances in the two populations being equal, we 


Sec. 10.4] Tesls of Significance 135 


may write 
1 1 
y mE p 
кетн (у; x) 


For correlated samples the variance of the sampling distribution of differences 
may be shown to be 


hag, = On? + о} — pisa (10.3) 


where руз is the correlation in the population. Note that the formula for 
independent samples is a particular case of the more general formula for 
correlated samples. It is the particular case which arises when p = 0. In 
the correlated case №; = Ns = N. 


10.4. Two-tailed and One-tailed Tests of Significance 


The rejection of the null hypothesis Ho implies the acceptance of an alter- 
native hypothesis Аз. In the case of means, Zo:ui = из. If the experi- 
mental evidence warrants the rejection of Ho, we accept one of the alternative 
hypotheses, Hi:gi ғ us or Hi:gi > из or Hı:uı < из. Thus if we reject 
the null hypothesis, we accept one of three alternative hypotheses: (1) the 
two means are different from each other; (2) the first mean is greater than the 
second; (3) the first mean is less than the second. 

A test leading to the acceptance of H;:ui 7< из is known as a two-tailed 
lest. This test asserts that the two means are different. No assertion is 
made about the direction of the difference. This test uses the two tails of 
the sampling distribution. Consider the 5 per cent significance level. If 
the sampling distribution is normal, 2.5 per cent of the area of the curve falls 
to the right of 1.96 standard deviation units above the mean and 2.5 per cent 
falls to the left of 1.96 standard deviation units below the mean. The area 
beyond these two limits is 5 per cent of the total area under the curve. The 
chances are 2.5 in 100 of getting a difference of 1.96 standard deviation units 
in one direction due to chance factors alone and 2.5 in 100 in the other direc- 
tion. Hence the chances in either direction are 5 in 100. Thus for a two- 
tailed test where the sampling distribution is normal, the observed difference 
must be equal to or greater than 1.96 times the standard deviation of the 
sampling distribution of differences for significance at the 5 per cent level. 
For significance at the 1 per cent level the ratio of the observed difference to 
its standard error should be equal to or greater than 2.58 for a two-tailed 
test. A two-tailed test is appropriate where concern is with the absolute 
magnitude of the difference, that is, with the difference regardless of sign. 

A test leading to the acceptance of either Hiiu > me or Hs:m < us is 
known as a one-tailed lest. This test asserts that one mean is either greater 
than or less than the other. An assertion is made about the direction of the 


136 Statistical Analysis in Psychology and Education [Cuar. 10 


difference. A one-tailed test uses one tail only of the sampling distribution. 
If that distribution is normal, 5 per cent of area of the curve falls beyond 1.64 
standard deviation units above the mean or below the mean. To argue at 
the 5 per cent level for the acceptance either of the hypothesis that pı is 
greater than ye or that p is less than us, the difference between the two 
sample means should be equal to or greater than 1.64 times the standard error 
of the difference. As the name implies, for a one-tailed test one tail only of 
the appropriate sampling distribution is used. A one-tailed test is used 
where the investigator's a priori speculation predicts a difference in one 
direction only. For significance at the 1 per cent level with a one-tailed test 
the ratio of the observed difference to its standard error should be equal to or 
greater than 2.33. 

Whether a two-tailed or one-tailed test is used resides in the question put 
to the data by the investigator. Arguments about the choice of test are 
not essentially statistical and stem from lack of clarity about the scientific 
question put to the data. The choice of test should be independent of the 
data. It should be made preferably when the experiment is in process of 
design. Usually it is inadmissable for an investigator to shift from a two- 
tailed to a one-tailed test after examining the data in order to achieve sig- 
nificance at some desired level. This may prove to be statistical sophistry. 
If we are interested in comparing two methods of memorizing nonsense 
syllables or two methods of teaching a school subject and wish to reach a 
decision as to which one of the two methods is the better, a two-tailed test is 
appropriate. We are interested in a difference in either direction and may 
have no prior grounds for predicting the direction of the difference. If, 
however, we are interested in a particular form of psychotherapy and wish to 
compare the possible advantages of that therapy with a situation where no 
therapy at all is applied, then a one-tailed test will probably be appropriate. 
Presumably, in this type of experimental situation the investigator has prior 
reason to believe that a difference, if it does occur, will occur in a particular 
direction. If the difference turned out to be significant in the opposite 
direction, the investigator would of course find himself at something of a 
logical impasse. Where the investigator is in doubt about the choice of test, 
a two-tailed, and not a one-tailed, test should be used. 


10.5. Significance of the Difference between Two Means for 
Independent Samples 


Let X, and Xi be two sample means based on N, and N3 cases, respectively. 
We proceed by combining the data for the two samples to obtain the best 
unbiased estimate of the population variance. This estimate is obtained by 
adding together the two sums of squares of deviations about the two sample 
means and dividing this by the total number of degrees of freedom. This 


Sec. 10.5] Tesls of Significance 137 
unbiased estimate of the population variance may be written as. 


(х — X)! + Z(X — £) 
+ NTN-2 


5° (10.5) 
In this case the total number of degrees of freedom on which s is based is 
Nı + Na — 2. Welose two degrees of freedom because deviations are taken 
separately about the means of the two samples. The unbiased variance 
estimate s? is used to obtain an estimate of the standard error of the difference 
between the two means. Thus 


Sa-25— vGHN1) T (s?/N2) (10.6) 


The difference between means, X, — Xz, is then divided by this estimate of 
the standard error to obtain the ratio 


X,— Ks Xı— XK 
"- М ] 10.7 
За м (8/1) + (8/03) 9 


This ratio has a distribution of / with №, + Ns — 2 degrees of freedom. The 
values of / required for significance at the 05 and .01 levels will vary, depend- 
ing on the number of degrees of freedom, and may be obtained by consulting 
Table B of the Appendix. 

Let the following be error scores obtained for two groups of experimental 
animals in running a maze under different experimental conditions. 


Group A 16 9.4 . 23 19 104. Sm 
Group B 20 5 1 16 2 4 


t 


The following statistics are calculated from these data: 
el کا کک لے لے‎ 


Group A | Group B 


N 8 6 

zX 88 48 

x 11 8 

У(Х — Ж): | 404 | 318 


pal eee ee ee M 
The unbiased estimate of the variance is 


_ 404 + 318 _ 
3% = 51—6=2 = 60.17 
The £ ratio is then 

11-8 


= 72 
$ 4//60.17/8 + 60.17/6 


The number of df in this example is 8 + 6 — 2 = 12. For df = 12,a t equal 
to 2.179 is required for significance at the .05 level. In this example the 


138 Slalislical Analysis in Psychology and Educalion |Снлр. 10 


difference between means is not significant. No adequate grounds exist for 
rejecting the null hypothesis. We are not justified in drawing the inference 
from these data that the two experimental conditions are exerting a dif- 
ferential effect on the behavior of the animals. 

The / test described here assumes that the distributions of the variable in 
the populations from which the samples are drawn are normal. It assumes 
also that these populations have equal variances. This latter condition is 
referred to as homogeneity of variance, or homoscedasticity The t test should 
be used only when there is reason to believe that the population distributions 
do not depart too grossly from the normal form and the population variances 
do not differ markedly from equality. Tests of normality and homogeneity 
` of variance may be applied, but these tests are not very sensitive for small 
samples. 


10.6. Significance of the Difference between Two Means for 
Correlated Samples 


Consider a situation where a single group of subjects is studied under two 
separate experimental conditions. The data may, for example, be auto- 
nomic-response measures under stress and nonstress, or measures of motor 
performance in the presence or absence of a drug. The data are comprised 
of pairs of measurements. These may be correlated. This circumstance 
leads to a test of significance between means different from that for inde- 
pendent samples. A procedure for testing significance may be applied with- 
out actually computing the correlation coefficient between the paired observa- 
tions. This method is sometimes called the difference method. Its nature 
is simply described. Given a set of N paired observations, the difference 
between each pair may be obtained. Denote any pair of observations by 
X; and Хз and the difference between any pair X, — Хз as D. The mean 
difference over all pairs is ([D)/N = D. It is readily observed that the 
difference between the means of the two groups of observations is equal to 
the mean difference. The difference between any pair of observations is 
X; — Хз = D. Summing over N pairs yields ХХ, — ХХ, = XD. Dividing 
by N we obtain €, — X; = D. Since the mean difference is the difference 
between the two means, we may test for the significance of the differences 
between means by testing whether or not D is significantly different from 
zero. Here in effect we treat the D's as a variable and test the difference 
between the mean of this variable and zero. 

The variance of the D's is given by 

хр? 


зр? = N D (10.8) 


An estimate of the variance of the sampling distribution of D, using an 


Sec. 10.6] Tesls of Significance 139 


unbiased estimate of the population variance, is 


"qu ^de 
—— 1 

The appropriate / ratio is obtained by dividing D by sj. Thus 
D D 


IT 557 VAIO = 1) 
The number of degrees of freedom used in evaluating ¢ is one less than the 
number of pairs of observations, or V — 1. The reader should note that the 
D in the numerator of the above formula is in effect D minus zero, which is 
of course D. This test is concerned with the significance of D from zero. 
'The data below are those obtained for a group of 10 subjects on a choice 
reaction-time experiment under stress and nonstress conditions, the stress 
agent being electric shock. The figures are number of false reactions over a 
series of trials. The problem here is to test whether the means under the 


(10.9) 


Nonstress 
Xs 

1 7 5 2 4 
1 9 15 —6 36 
3 4 7 —3 9 
4 15 11 4 16 
5 6 4 2 4 
6 3 7 —4 16 
7 9 8 1 1 
8 5 10 -5 25 
9 6 6 0 0 
10 12 16 -4 16 
Sum..... 76 89 —13 127 

Меап 7.60 8.90 | —1.30 


two conditions are significantly different. These means are 7.60 and 8.90, 
The difference between them is equal to the mean of the differences, or 
—1.30. The variance of the differences is 


sp? = My — (—1.30)? = 11.01 


For a two-tailed test we are interested only in the absolute magnitude of 
D. We ignore the negative sign, and ¢ becomes 


vei s Law 
A/11.01/9 


The number of degrees of freedom associated with this / is 9. For 9 degrees 


140 Slalislical Analysis in Psychology and Education | Cup. 10 


of freedom we require a ¢ of 2.262 for significance at the 5 per cent level. 
The observed value of ¢ is well below this, and the difference between means 
is not significant. We cannot justifiably argue from these data that the mean 
number of false reactions under the two conditions is different. 

The method described above takes into account the correlation between 
the paired measurements. This results because the variance of differences 
is related to the correlation between the paired measurements (Sec. 7.7) by 
the formula 


Sp? = S1? + ss? — 27195152 


When sı, 5s, and r12 have been computed, as will frequently be the case, the 
variance of differences sp? can be readily obtained from the above formula 
and need not be obtained by direct calculation on the differences themselves. 
A positive correlation between the paired measurements will reduce the size 
of sp? and s5’. 


10.7. Significance of the Difference between Variances for 
Independent Samples 


Occasions arise where a test of the significance of the difference between 
the variances of measurements for two independent samples is required. 
In the conduct of a simple experiment using control and experimental 
groups, the effect of the experimental condition may reflect itself not only in 
a mean difference between the two groups but also in a variance difference. 
For example, in an experiment designed to study the effect of a distracting 
agent, such as noise, on motor performance the effect of the distraction may 
be to greatly increase the variability of performance, in addition possibly to 
exerting some effect upon the mean. The variances obtained in any experi- 
ment should always be the object of scrutiny and comparison. A common 
situation, where a test of the significance of the difference between variances 
is required, is in relation to the / test for the significance of the difference 
between two means. ‘This test assumes the equality of variances in the 
populations from which the samples are drawn; that is, it assumes that 
ci = оз? = o. This condition is usually spoken of as homogeneity of 
variance. 

Let sı? and ss? be two variances based on independent samples. We may 
consider the difference 512 — sa. Ап alternate procedure is to consider the 
ratio 512/55? or 522/5]. If the two variances are equal, this ratio will be unity. 
If they differ and sı? > 54°, then s;?/ss?* > 1 and 52/512 < 1, A departure 
of the variance ratio from unity is indicative of a difference between variances, 
the greater the departure the greater the difference. Quite clearly, a test of 
the significance of the departure of the ratio of two variances from unity will 
serve as a test of the significance of the difference between the two variances. 


Sec. 10.7] Tesls of Significance 141 


То apply such a test the sampling distribution of the ratio of two variances 
is required. .To conceptualize such a sampling distribution, consider two 
normal populations А and B with the same variance о? Draw samples of 
N, cases from A and IV» cases from B, calculate unbiased variance estimates 
sı? and s»?, and compute the ratio 512/52. Continue this procedure until a 
large number of variance ratios is obtained. Always place the variance of the 
sample drawn from А in the numerator and the variance of the sample 
drawn from B in the denominator. Some of the variance ratios will be 
greater than unity; others will be less than unity. The frequency distribu- 
tion of the variance ratios for a large number of pairs of variances is an 
experimental sampling distribution. TThe corresponding theoretical sampling 
distribution of variance ratios is known as the distribution of Е. The vari- 
ance ratio is known as an F ratio; that is, F = 512/522, or F = s/s. 

In the above illustration samples of N, are drawn from one population 
and samples of № from another. №, — 1 and Ns — 1 degrees of freedom 
are associated with the two variance estimates. A separate sampling 
distribution of Ё exists for every combination of degrees of freedom. Table 
D of the Appendix shows values of F required for significance at the 5 and 1 
per cent levels for varying combinations of degrees of freedom. This 
table shows values of F equal to or greater than unity. It does not show 
values of F less than unity. The number of degrees of freedom associated 
with the variance estimates in the numerator and denominator are shown 
along the top and to the left, respectively, of Table D. The numbers in 
lightface type are the values for significance at the 5 per cent level, and those 
in boldface type the values at the 1 per cent level. These values cut off 
5 and 1 per cent of one tail of the distribution of F. 

In testing the significance of the difference between two variances, the 
null hypothesis Ho:o1? = es? = о? is assumed. We then find the ratio of the 
two unbiased variance estimates. These are 


and TU 


No prior grounds exist for deciding which variance estimate should be placed 
in the numerator and which in the denominator of the F ratio. In practice 
the larger of the two variance estimates is always placed in the numerator 
and the smaller in the denominator. In consequence the F ratio in this 
situation is always greater than unity. The F ratio is calculated, referred 
to Table D of the Appendix, and a significance level determined. At this 
point a slight complication arises. The obtained significance level must be 
doubled. Table D shows values required for significance at the 5 and the 
1 per cent levels. In comparing the variances for two independent groups 


142 Slalislical Analysis in Psychology and Education [(СнаР. 10 


these become the 10 and the 2 per cent levels. The reason for this com- 
plication resides in the fact that the larger of the two variances has been 
placed in the numerator of the F ratio. This means that we have con- ; 
sidered one tail only of the F distribution. Not only must we consider 
the probability of obtaining s/s» but also the probability of s/s. Where 
interest is in the significance of the difference, regardless of direction, the 
required per cent or probability levels are simply obtained by doubling those 
shown in Table D. 

Table D has been prepared for use with the analysis of variance (Chap. 
15) which makes extensive use of the ratio. In the analysis of variance 
the decision as to which variance estimate should be put in the numerator 
and ‘which in the denominator is made on grounds other than their relative 
size. Consequently, in the analysis of variance, F ratios less than unity can 
occur and Table D provides the appropriate probabilities without any 
doubling procedure. 

To illustrate, a psychological test is administered to a sample of 31 boys 
and 26 girls. The sum of squares of deviations, E(X — Х)?, is 1,926 for 
boys and 2,875 for girls. Unbiased variance estimates are obtained by 
dividing the sum of squares by the number of degrees of freedom. The df 
for boys is 31 — 1 = 30 and for girls 26 — 1 = 25. The variance estimate 
for boys is 1,926/30 = 64.20 and for girls 2,875/25 = 115.00. Are boys 
significantly different from girls in the variability of their performance on 
this test? The F ratio is 115.00/64.20 = 1.79. The df for the numerator is 
25 and for the denominator 30. Referring this F to Table D we see that a 
value of F of about 1.88 is required for significance at the 5 per cent level, and 
doubling this we obtain the 10 per cent level. It is clear, therefore, that the 
difference between the variances for boys and girls cannot be considered 
statistically significant. "The evidence is insufficient to warrant rejection of 
the null hypothesis. 


10.8. Significance of the Difference between Correlated Variances 


Given a set of paired observations, the two variances are not independent 
estimates. Data of this kind arise when the same subjects are tested under 
two experimental conditions, or matched samples are used. For example, in 
an experiment designed to study the effects of an educational program on 
attitude change, attitudes may be measured, an educational program applied, 
and attitudes remeasured. It may be hypothesized that some change in 
variance of attitude-test scores may result. An increase in variance may 
mean that the effect of the program is to reinforce existing attitudes, produc- 
ing more extreme attitudes among individuals at both ends of the attitude 
continuum. А decrease in variance may mean that the effect of the program 
is to produce an attitudinal regression to greater uniformity. 


Src. 10.9] Tests of Significance 143 


If 5,2 and ss? are the two unbiased variance estimates and rj is the correla- 
tion between the paired observations, the quantity 


_ GET з) VN —2 
= VAsitsi d та) (10.10) 


has а ¢ distribution with У — 2 degrees of freedom. 

By way of illustration let 51° and sy? be unbiased variance estimates of 
attitude scale scores before and after the administration of an educational 
program. Let sj? = 153.20 and s: = 102.51 where N = 38. The correla- 
tion between the before-and-after attitude measures is .60. Are the two 
variances significantly different from each other? We obtain - 


_ _(153.20 — 102.51) М38—2__ 1:59 
УА x 153.20 X 102510 — .36) —— 


The number of degrees of freedom is 38 — 2 = 36. For significance at the 
5 per cent level, a value of / equal to or greater than about 2.03 is required. 
The evidence is insufficient to warrant rejection of the null hypothesis. We 
cannot argue that the intervening educational program has changed the 
variability of attitudes. 


10.9. Significance of the Difference between Means Where 
Population Variances Are Unequal 


The / test for the significance of the difference between means assumes 
equality of the population variances. Where the assumption of equality of 
variance is untenable, the ordinary / test should not be applied. Approxi- 
mate methods for use where the variances are unequal have been suggested 
by Cochran and Cox (1950) and by Welch (1938). The method of Cochran 
and Cox makes an adjustment in the value of ¢ required for significance at the 
5 or 1 per cent level, or other critical level as may be required. The method 
proposed by Welch makes an adjustment in the number of degrees of freedom. 

To use the Cochran and Cox method we proceed by calculating the 
standard error of the differences between the two means, using the formula 


БОХ = 8) , 1X = Xs)? . уә 
35-5 7 VV — 1) ят N4N;— 1) ды Vsa + saê (10.11) 


The difference between the sample means is then divided by the standard 
error of the difference to obtain 


144 Slalislical Analysis in Psychology and Education |Снар. 10 


One sample is based on V; cases with NV; — 1 degrees of freedom, the other on 
М» cases with Na — 1 degrees of freedom. Assume that a two-tailed test at 
the 5 per cent level is appropriate. Refer to a table of / and obtain the critical 
value of / required for significance at the 5 per cent level with №, — 1 degrees 
of freedom. Obtain also the value of ¢ required with № — 1 degrees of 
freedom. Denote these two values of fas й and fz. The approximate value 
of ¢ required for significance at the 5 per cent level is given by the formula 
sah + Sata 

tos = Suh Ris (10.12) 
The value of ¢ obtained by dividing the difference between means by the 
standard error of their difference must be equal to or greater than /. о» before 
significance at the 5 per cent level can be claimed. 

Consider the following data: 


Sample A Sample B 
№. = 13 № = 9 
X, = 26.99 X, = 15.10 
У(Х — Xi)? = 1,128 У(Х — X3? = 1,269 
$33 = 7.23 Se = 17.62 


The standard error of the difference between means is 


TST ME 
os dit \ i13—1) 99-1) ^ М/7.23 + 17.62 = 4.98 


Divide this into the difference between means to obtain 


The values of ¢ required for significance at the 5 per cent level for 13 — 1 = 12 
and 9 — 1 = 8 degrees of freedom are, respectively, 2.179 and 2.306. The 
value required for significance at the 5 per cent level in testing the significance 
of the difference between means is then 


_ 723 X 2479 + 17.62 X 2.306 _ 


Los 7.23 F 17.62 


2.27 
This value 2.27 is less than the obtained value 2.39. Consequently we may 
conclude that the difference between means is significant at the 5 per cent 
level. 

Another approximate method proposed by Welch (1947) requires the 
calculation of a / value as above by dividing the difference between means by 
their standard error. We then refer this value to the table of 4 using the 


Sec. 10.10] Tests of Significance 145 
following formula for the number of degrees of freedom: 


(sa? + 522)? 


d 7 SION. + 1) Ga AND 


2 (10.13) 


Applying this formula to the previous data we obtain 


(7.23 + 17.62)? 

df = 7333/14 + 17.62/10 — 2 15-76 

The value of df will seldom be a whole number. If df is taken as 16, the 
value of { required for significance at the 5 per cent level is 2.12. If df is 
taken as 15, the value is 2.13. In either case the observed value of /, 2.39, 
exceeds the value required for significance at the 5 per cent level and we may 
conclude that the difference between means is significant. This result is in 
agreement with that obtained using the Cochran and Cox procedure. 

The above procedures are approximate. For a more accurate method the 
reader is referred to Welch (1947) and Aspen (1949). The latter author has 
prepared tables which assist the comparison of means involving two 
variances, separately estimated. The problem has also been discussed by 
Gronow (1951). 


10.10. Significance of the Difference between Means Where the 
Population Distributions Are Not Normal 


The t test for the significance of the difference between means assumes 
normality of the distributions of the variables in the populations from which 
the samples are drawn. Where the variables are not normally distributed, 
what effect will this have on the probabilities, and significance levels, as 
estimated from the distribution of /? 

Under certain conditions the sampling distribution of means of size N, 
where № is large, is closely approximated by the normal distribution. This 
result holds regardless of the shape of the distribution in the population from 
which the samples are drawn. Тһе closeness of the approximation improves 
as № becomes increasingly large. Тһе implication of this is that for large 
samples the nonnormality of the populations will not seriously affect the 
estimation of probabilities, except perhaps in cases of very extreme skewness. 

A number of investigators have studied the effect of nonnormal populations 
on the / test for small samples. The empirical evidence suggests that even 
for quite small samples, say, of the order of 5 or 10, reasonably large depar- 
tures from normality will not seriously affect the estimation of probabilities 
for a two-tailed / test. A one-tailed / test is, however, more seriously 
affected by nonnormality. This results from the skewness of the sampling 
distribution. 


146 Slalislical Analysis in Psychology and Education [Cuar. 10 


Where the data show fairly gross departures from normality it is probably 
advisable to use nonparametric, or distribulion-free, methods. These methods 
provide tests which are independent of the shapes of the distributions in the 
populations from which the samples are drawn. They deal with the ordinal 
or sign properties of the data. А number of such tests are described in 
Chap. 17 of this book. Nonparametric methods are being used with increas- 
ing frequency in psychological research. 


10.11. Significance of the Difference between Two 
Independent Proportions 


Questions arise in the interpretation of experimental results which require 
a test of significance of the difference between two independent proportions. 
The data are comprised of two samples drawn independently. Of the Nı 
members in the first sample, fı have the attribute A. Of the №, members in 
the second sample, fz have the attribute A. The proportions having the 
attributes in the two samples are fi/N; = pı and f2/N2 = f». Сап the two 
samples be regarded as random samples drawn from the same population? 
Is pı significantly different from р? To illustrate, in a public opinion poll 
the proportion .65 in a sample of urban residents may express a favorable 
attitude toward a particular issue as against a proportion .55 in a sample of 
rural residents. May the difference between the proportions be interpreted 
as indicative of an actual urban-rural difference in opinion? To illustrate 
further, the proportion of failures in air-crew training in two training periods 
may be.42and.50. Does this represent a significant change in the proportion 
of failures, or may the difference be attributed to sampling considerations? 

'The standard error of a single proportion is estimated by the formula 


where û = sample value of a proportion 

f ы. 

The standard error of the difference between two proportions based on 
independent samples is estimated by 


Spp» = Jn (4. T x) (10.14) 


where р is an estimate based on the two samples combined. The value р is 
obtained by adding together the frequencies of occurrence of the attribute 
in the two samples and then dividing this by the total number in the two 


Sec. 10.11] Tests of Significance 147 
samples. Thus 


а fi f 
№ + Ns 


where f; and / are the two frequencies. 

The justification for combining data from the two samples to obtain a 
single estimate of p resides in the fact that in all cases where the difference 
between two proportions is tested, the null hypothesis is assumed. This 
hypothesis states that no difference exists in the population proportions. 
Because the null hypothesis is assumed, we may use an estimate of p based on 
the data combined for the two samples. This procedure is analogous to that 
used in the / test for the difference between means for independent samples 
where the sums of squares for the two samples are combined to obtain a 
single variance estimate. 

To test the difference between two proportions we divide the observed 
difference between the proportions by the estimate of the standard error of 
the difference to obtain 


sh lubeburd i 10.15 
Spi-ps УРО) + 4/N3)] : i 


The value = may be interpreted as a deviate of the unit normal curve, provided 
№, and N, are reasonably large and p is neither very small nor very large. 
As usual for a two-tailed test, values of 1.96 and 2.58 are required for sig- 
nificance at the 5 and 1 per cent levels. 

How large should the №” be and how far should p depart from extreme 
values before this ratio can be interpreted as a deviate of the unit normal 
curve? Ап arbitrary rule may be used here. If the smaller value of P or q 
multiplied by the smaller value of JV exceeds 5, then the ratio may be inter- 
preted with reference to the normal curve. Thus if p = .60, д = .40, 
№, = 20, and Nz = 30, the product .40 X 20 = 8 and the normal curve 
may be used. 

To illustrate, we refer to data obtained in a study of the attitudes of 
Canadians to immigrants and immigration policy. Independent samples of 
French- and English-speaking Canadians were used. Subjects were asked 
whether they agreed or disagreed with present government immigration 
practices. In the French-speaking sample of 300 subjects, 176 subjects 
indicated agreement. Тһе proportion f, is 176/300 = .587. In the English- 
speaking sample of 500 subjects, 384 indicated agreement. The proportion 
ps is 384/500 = .768. By combining data for the two samples we obtain a 
value 

| 176 + 384 _ 


P= 3004500 7 79 


148 Slalislical Analysis in Psychology and Educalion [Снар. 10 


The value of qis 1 — .700 = .300. The estimate of the standard error of the 
difference is 


Sp, = W100 X 300gbs + sho) = 011 


The required z value is 
a 7168 = .587 _ 


Oii 16.5 


Interpreting the value 16.5 as a unit-normal-curve deviate we observe 
immediately that the difference is highly significant. The chances are one 
in a great many millions that the observed difference could result from sam- 
pling. We may very safely conclude from these data that a real difference 
exists between French- and English-speaking Canadians on the question 
asked. 

An alternative, but closely related, method exists for testing the significance 
of the difference between proportions for independent samples. This method 
uses chi square and is described in Chap. 11. 


10.12. Significance of the Difference between Two 
Correlated Proportions 


Frequently in psychological work we wish to test the significance of the 
difference between two proportions based on the same sample of individuals or 
on matched samples. The data consist of pairs of observations and are 
usually nominal in type. The paired observations may exhibit a correlation, 
which must be taken into consideration in testing the difference between 
proportions. To illustrate, a psychological test may be administered to a 
sample of N individuals. The proportions passing items 1 and 2 are pı and 
ps. Paired observations are available for each individual. One individual 
may “pass” item 1 and also “pass” item 2. А second individual may “pass” 
item 1 and “fail” item 2. А third individual may “fail” both items. The 
paired observations may be tabulated in a 2 X 2 table. A tendency may 
exist for individuals who pass item 1 to also pass item 2 and for those who 
fail item 1 to also fail item 2. Thus the items are correlated. A further 
illustration arises where attitudes are measured with an attitude scale before 
and after a program designed to induce attitude change. On any particular 
attitude item a before response and an after response are available. Thus the 
data are comprised of a set of paired observations. To apply a test of 
significance to the difference between before and after proportions on any 
particular item requires that the correlation between responses be taken into 
account. 

We proceed by tabulating the data in the form of a fourfold, or 2 X 2, 
table. A table with four cell frequencies is obtained. By way of illustration, 


Sec. 10.12] Tesls of Significance 149 


assume that the data are “pass” and “fail” on two test items. The data 
may be represented schematically as follows: 


Frequencies Proportions 
Item 2 Item 2 
Fail Pass Fail Pass 
= Pass A B А+В = Pass a | b hi 
8 8 
Š Fail G D C+D Š Fail c | d n 
A+C B+D N 92 bi 1.00 


The capital letters represent frequencies. The small letters are proportions 
obtained by dividing the frequencies by №. The proportions passing the two 
items are pı and ра. We wish to test the significance of the difference 
between pı and f». 

An estimate of the standard error of the difference between two correlated 
proportions is given by the formula 


aS JH d (10.16) 


This formula is due to McNemar (1947, 1955). It takes into account the 
correlation between the paired observations. A normal deviate z is obtained 
by dividing the difference between the two proportions by the standard error 
of the difference. Thus 
ges Pe 
£ +d (10.17) 
күт 


When the sum of the two cell frequencies, A + D, is reasonably large, this 
ratio can be interpreted as a unit-normal-curve deviate, values of 1.96 and 
2.58 being required for significance at the 1 and 5 per cent levels for a two- 
tailed test. In this context a reasonably large value of A + D may be 
taken as about 20 or above. 

It may be shown that the formula for the value of z given above reduces to 


qa HAE 
VA + D 


where А and D are the cell frequencies. For computational purposes this is 
the more useful formulation. 

To illustrate, consider the following fictitious data relating to attitude 
change. Let us assume an initial testing followed by a program intended to 
produce a change in attitude, and then a second testing with the same attitude 


(10.18) 


150 Slalislical Analysis in Psychology and Education [Cuar. 10 


scale. Ona particular question let the data for the two testings be as follows: 


Frequencies Proportions 
2d 2d 
Disagree Agree Disagree Agree 
Agree 10 50 60 i 
ist 
` Disagree | 110 30 140 
120 80 200 


Inspection of the above tables indicates a high correlation in response 
between the first and second testings. We wish to test the significance of the 
difference between .40 and .30. The standard error of the difference between 
the two proportions is 


D» ea юр = 10316 
The value of 2 is 
40 — .30 
з= 0316 =-3.16 


In this case the difference is significant. It exceeds the value of 2.58 required 
for significance at the 1 per cent level for a two-tailed test. Arguments may 
be advanced for the use of a one-tailed test with the above data. It may be 
assumed that knowledge of a program intended to induce attitude change 
may warrant a hypothesis about the direction of the change. In either case 
the result is significant. 


10.13. Sampling Distribution of the Correlation Coefficient 


We may draw a large number of samples from a population, compute a 
correlation coefficient for each sample, and prepare a frequency distribution 
of correlation coefficients. Such a frequency distribution is an experimental 
sampling distribution of the correlation coefficient. To illustrate, casual 
observation suggests that a positive correlation exists between height and 
weight. A number of samples of 25 cases may be drawn at random from a 
population of adult males, and a correlation coefficient between height and 
weight computed for each sample. These coefficients will display variation 
one from another. By arranging them in the form of a frequency distribu- 
tion an experimental sampling distribution of the correlation coefficient is 
obtained. The mean of this distribution will tend to approach the population 
value of the correlation coefficient with increase in the number of samples. 
Its standard deviation will describe the variability of the coefficients from 
sample to sample. A further illustration may prove helpful. By throwing 


Sec. 10.13] Tests of Significance 151 


a pair of dice a number of times, say, а white one and a red one, a set of paired 
observations is obtained. A correlation coefficient may be calculated for 
the paired observations. Since the two dice are independent, the expected 
value of this correlation coefficient is zero. However, for any particular 
sample of N throws, a positive or a negative correlation may result. А large 
number of samples of N throws may be obtained, a correlation coefficient 
computed for each sample, and a frequency distribution of the coefficients . 
prepared. The mean of this experimental sampling distribution will tend to 
approach zero, the population value of the correlation coefficient, and its 
standard deviation will be descriptive of the variability of the correlation in 
drawing samples of size N from this particular kind of population. Note 
that here, as in all sampling problems, a distinction is drawn between a 
population value and an estimate of that value based on a sample. The 
symbol p is used to refer to the population value of the correlation coefficient, 
and r is the sample value. 

The shape of the sampling distribution of the correlation coefficient 
depends on the population value p. As p departs from zero, the sampling 
distribution becomes increasingly skewed. When p is high positive, say, 
p = .80, the sampling distribution has extreme negative skewness. Similarly, 
when p is high negative, say, p = —.80, the distribution has extreme 
positive skewness. When p = 0, the sampling distribution is symmetrical 
and for large values of N, say, 30 or above, is approximately normal. The 
reason for the increase in skewness in the sampling distribution as p departs 
from zero is intuitively plausible. In sampling, for example, from a popula- 
tion where p = .90, values greater than 1.00 cannot occur, whereas values 
extending from .90 to —1.00 are theoretically possible. The range of 
possible variation below .90 is far greater than the range above 90. This 
suggests that the sample values may exhibit greater variability below than 
above .90, a circumstance which leads to negative skewness. 

The standard deviation of the theoretical sampling distribution of p, the 
standard error, is given by the formula 


ao (10.19) 


When p departs appreciably from zero, this formula is of little use, because 
the departures of the sampling distribution from normality make interpreta- 
tion difficult. 

Difficulties resulting from the skewness of the sampling distribution of the 
correlation coefficient are resolved by a method developed by R. A. Fisher. 
Values of r are converted to values of z, using the transformation 


z, = $log, (1 +7) = Flog. а — r) (10.20) 


152 Statistical Analysis in Psychology and Education [Cuap. 10 


Values of z, corresponding to particular values of r need not be computed 
directly from the above formula, but may be simply obtained from Table E 
in the Appendix. For.r = .50 «he corresponding 2, = .549, for r = .90 
2, = 1.472, and so on. For negative values of r the corresponding z, values 
may be given a negative sign. In а number of sampling problems involving 
correlation, r’s are converted to z,'s, and a test of significance is applied to the 
z,’s instead of to the original 7’s. 

One advantage of this transformation resides in the fact that the sampling 
distribution of z, is for all practical purposes independent of p. The dis- 
tribution has the same variability for a given № regardless of the size of p. 
Another advantage is that the sampling distribution of z, is approximately 
normal. Values of z, can be interpreted in relation to the normal curve. 
The standard error of z, is given by 

Mri i qi 
“ vN-3 : 
The standard error is seen to depend entirely on the sample size. 

The z, transformation may be used to obtain confidence limits for r. Let 
т = .82 for № = 147. The corresponding 2, = 1.157. The standard error 
of z, given by 1/4/N — 3, is .083. The 95 per cent confidence limits are 
obtained by taking 1.96 times the standard error above and below the 
observed value of s,, or 2, + 1.96s,. These аге 1.157 + 1.96 X .083 = 1.320 
and 1.157 — 1.96 X .083 = .994. These two z,'s may now be converted 
back to r's, where 2, = 1.320, r = .867 and where 2, = .994, r = .759. 
Thus we may assert with 95 per cent confidence that the population value of 
the correlation coefficient falls within the limits .759 and .867. In practice 
we are infrequently concerned with fixing confidence intervals for correlation 
coefficients. The most frequently occurring problems are testing the sig- 
nificance of a correlation coefficient from zero and testing the significance of 
the difference between two correlation coefficients. 


* 10.14. Significance of a Correlation Coefficient 


Testing the significance of the correlation between a set of paired observa- 
tions is a frequent problem in psychological research. We begin by assuming 
the null hypothesis that the value of the correlation coefficient is equal to 
zero, ог Ho:p = 0. А test of significance may then be applied using the 
distribution of £. The / value required is given by the formula 


N-2 
= "i 29 (10.22) 


The number of degrees of freedom associated with this value of { is N— 2. 


Src. 10.15] Tesls of Significance 153 


'The loss of 2 degrees of freedom results because testing the significance of 
r from zero is equivalent to testing the significance of the slope of a regression 
line from zero. The reader will recall that the correlation coefficient is the 
slope of a regression line in standard-score form. The number of degrees of 
freedom associated with the variability about a straight line fitted to a set 
of points is two less than the number of observations. A straight line will 
always fit two points exactly, and no freedom to vary is possible. With 
three points there is 1 degree of freedom, with four points 2 degrees of freedom, 
and so on. 
Consider an example where r — .50and N — 20. We obtain 


20 — 2 
t= .50 АКЫН = 245 


The df = 20 — 2 = 18. Referring to the table of ¢, Table В in the Appendix, 
we find that for this df a ¢ of 2.10 is required for significance at the 5 per cent 
level and a £ of 2.88 at the 1 per cent level. The sample value of r falls 
between these two values. It may be said to be significant at the 5 per cent 
level. 

Table F of the Appendix presents a tabulation of the values of required 
for significance at different levels. We note that where the number of 
degrees of freedom is small, a large value of r is required for significance. 
For example, where df = 5, a value of r > .754 is required before we can 
argue at the 5 per cent level that the r is significant. Even for df = 20, a 
value of r > .423 is required for significance at the 5 per cent level. This 
means that little importance can be attached to correlation coefficients 
calculated on small samples unless these coefficients are fairly substantial in 
size. 


10.15. Significance of the Difference between Two Correlation 
Coefficients for Independent Samples 


Li 


Consider a situation where two correlation coefficients, rı and rs, are - 


obtained on two independent samples. The correlation coefficients may, for 
example, be correlations between intelligence-test scores and mathematics- 
examination marks for two different freshman classes. We wish to test 
whether r, is significantly different from re, that is, whether the two samples 
can be considered random samples from a common population. The null 
hypothesis is Ho:p1 = ps or Ho:pi — рз = 0. 

The significance of the difference between rı and rs can be readily tested 
using Fisher's z, transformation. Convert rı and rs to z,'s using Table E of 
the Appendix. As stated previously, the sampling distribution of z, is 
approximately normal with a standard error given by s, = 1/4/N — 3. 


154 Slalislical Analysis in Psychology and Education [(Снлр. 10 


The standard error of the difference between two values of z, is given by 


о, pL 1 1 
fs 3 ОКШ c id EE Й 
Sirera T EE AE 5; hn EI Е Ns Sei 3 (10 23) 
" 
By dividing the difference between the two values of z, by the standard error 
of the difference, we obtain the ratio 


m Pii Жалпы 10.24 
$7 VAL = 3) + 1/5 — 3) we 


ThiStis a unit-normal-curve deviate and may be so interpreted. Values of 
1.96 and 2.58 are required for significance at the 1 per cent and 5 per cent 
levels. 

To illustrate, let the correlations between intelligence scores and mathe- 
matics-examination marks for two freshman classes be .320 and .720. Let 
the number of students in the first class be 53 and in the second 23. Are 
the two coefficients significantly different? Тһе corresponding 2, values 
obtained from Table E of the Appendix are .332 and .908. The required 
normal deviate is 


7 908 — 332 d 
М1/63 = 3) + 1/03 — 3) 


The difference between the two correlations is significant at the 5 per cent 
level. 

The application of a test of significance in a situation of this kind is simple. 
The interpretation of what the difference in correlation means may be difficult. 


z 2.18 


10.16. Significance of the Difference between Two Correlation 
Coefficients for Correlated Samples 


Consider a situation where three measurements have been made on the 
same sample of individuals. Three correlation coefficients result, ris, ris, 
and res. If we wish to compare ris and ris, or ri» and res, or 7ı and rss, the 
method described in the preceding section does not apply. Here the two 
coefficients under comparison are not based on independent samples but are 
based on the same sample and are correlated. 

To test the difference between r,s and rı, under these conditions, we may 
calculate a value / by the formula 


i4 (riz — ris) V(N SIE 

V — ru! — وم‎ — ra? + nsns) 

This expression follows the distribution of ! with N — 3 degrees of freedom. 
Note that to apply this test the correlation rs; is required. 


(10.25) 


Src. 10.17] Tests of Significance 155 


Let Xs and X; be two psychological tests used to predict a criterion measure 
of scholastic success Ху. The three correlation coefficients based on a sample 
of 100 cases are т1 = .60, rıs = .50, and rz = «50, e X» and X; signifi- 
cantly different as predictors of scholastic success? Is there a reasonable 
probability that the difference between the two eorrelations rı: and rı can be 
explained in terms of sampling error? The value of ¢ is 


ut (60 — .50) ^/(100 — 3)(1 + .50) Pede 
A/Aü — .60° — .50: — .50 + 2 X .60 X .50 X .50) i 


For df = 97, a t of about 1.99 is required for significance at the 5 per cent 
level. In consequence, the difference between the two correlation coefficients 
cannot be said to be significant. 

The above test has certain restrictive assumptions underlying its develop- 
ment and because of these is perhaps not entirely satisfactory. For further 
discussion see Walker and Lev (1953). 


10.17. Effect of Grouping on Sampling Error 


The error introduced by grouping data in the form of a frequency distribu- 
tion exerts no systematic effect on the mean as an estimate of the population 
value. The error variance of the mean computed from grouped data is, 
however, greater than the error variance of the mean computed from 
ungrouped data. The error variance of a mean for grouped data is comprised 
of two components, one resulting from sampling error, the other from group- 
ing error. The standard deviation is systematically influenced by grouping 
error; the effect of grouping error is to increase the standard deviation. In 
computing the standard error of the mean for grouped data by the formula 
ss = s,/N/N, values of sz uncorrected for grouping should be used. This 
results in a value of sa which is greater than that obtained by using the 
corrected value of з. The use of the uncorrected value of s, adjusts for the 
increase in the error variance of the mean resulting from grouping error. In 
general, in applying any test of significance to statistics calculated from 
grouped data, values uncorrected for grouping should be used. 


EXERCISES 


1. The following are data for two samples of subjects under two experimental conditions: 


Sample A go T9 1 


6 
Sample B 4 16 11 9 8 


Test the significance of the difference between (a) means and (P) variances. Use a 
two-tailed test. 


156 Statistical Analysis in Psychology and Education [Снлр. 10 


2. The following are data for two independent samples: 


Sample А | Sample B 


T 124 120 
N 50 36 
У(Х — X» 5,512 5,184 


Test whether the means and variances for sample A are equal to or greater than for 
sample B. 

3. The following are paired measurements obtained for a sample of eight subjects under 
two conditions: 


Condition A 8 17 12 19 5 6 20 3 
Condition B 12 31 17 17 8 LENNA vg 


Test the significance of the difference between (a) means and (0) variances. 
4. Test the significance of the difference between two unbiased variance estimates 196 

and 361 for 27 paired measurements with a correlation of .40. 

- What advantages attach to matched groups or paired observations in experimentation? 

6. The means for two independent samples of 10 and 17 cases are 9.63 and 14.16, respec- 
tively. The unbiased variance estimates are 64.02 and 220.30. Compare the methods 
proposed by Cochran and Cox and by Welch to test the significance of the difference 
between the two means. 

7. In a market survey 24 out of 96 males and 63 out of 180 females indicate a preference 
for a particular brand of cigarettes. Do the data warrant the conclusion that a sex 
difference exists in brand preference? 

8. On an attitude scale 63 and 39 individuals from a sample of 140 indicate agreement to 
items А and B, respectively, and 29 individuals indicate agreement to both items. Is 
there a significant difference in the response elicited by the two items? 

9. Find the confidence limits for (a) = .05, N = 100, and (b) ғ = .80, N = 28. 

10. Test whether a correlation coefficient of .50 based on a sample of 20 cases differs signifi- 
cantly from zero. 

The correlation between psychological-test scores and academic achievement for a 
sample of 147 freshmen is .40. The corresponding correlation for a sample of 125 
sophomores is .59. Do these correlations differ Significantly? 


л 


1. 


m 


cHAPTER 11 


CHI SQUARE 


11.1. Introduction ad 


We have previously discussed the application of the binomial, normal, £, 
and F distributions. Another distribution of considerable theoretical and 
practical importance is the distribution of chi square, or x°. In many 
experimental situations we wish to compare observed with theoretical fre- 
quencies. The observed frequencies are those obtained empirically by direct 
observation or experiment. The theoretical frequencies are generated on the 
basis of some hypothesis, or line of theoretical speculation, which is inde- 
pendent of the data at hand. The question arises as to whether the dif- 
ferences between the observed and theoretical frequencies are significant. If 
they are, this constitutes evidence for the rejection of the hypothesis or 
theory that gave rise to the theoretical frequencies. 

Consider, for example, a die. We may formulate the hypothesis that the 
die is unbiased, in which case the probability of throwing any of the six 
possible values in a single toss іѕ 2. The frequencies expected on the basis of 
this hypothesis are the theoretical frequencies. In a series of 300 throws the 
expected or theoretical frequencies of 1, 2, 3, 4, 5, and 6 are 50, 50, 50, 50, 50, 
and 50. Let us now experiment by throwing the die 300 times. The 
observed frequencies of the values from 1 to 6 are 43, 55, 39, 56, 63, and 44. 
May the differences between the observed and theoretical frequencies be 
considered to result from sampling error? Are the differences highly 
improbable on the basis of the null hypothesis, thereby providing evidence 
for the rejection of the hypothesis that the die is unbiased? 

As a further illustration, let us formulate the hypothesis that in litters of 
rabbits the probability of any birth being either male or female is $. Using 
the binomial distribution we ascertain that the expected or theoretical fre- 
quencies of 0, 1, 2, 3, 4, 5, and 6 males in 64 litters of 6 rabbits are 1, 6, 15, 20, 
15,6, апа 1. By counting the number of males in 64 litters of six rabbits, the 
corresponding observed frequencies are 0, 3, 14, 19, 20, 6, and 2. Do the 
observed and theoretical frequencies differ? 

Consider another example. Ina market research project two varieties of 
soap, А and B, are distributed to a random sample of 200 housewives. After 
a period of use the housewives are asked which they prefer. The results 

151 


158 Statistical Analysis in Psychology and Education [Снар. 11 


indicate that 115 prefer A and 85 prefer B. The hypothesis may be formu- 
lated that no difference exists in consumer preference for the two varieties 
of soap; that a 50:50 split exists. Do the observed frequencies constitute 
evidence for the rejection of this hypothesis? 
The statistic x* is used in situations of the type described above where a 
comparison of observed and theoretical frequencies is required. It has 
- extensive application in statistical work. x? is defined by 


_YoO- E? 
х? = 5 <= (11.1) 


where O = an observed frequency 

Е = an expected or theoretical frequency 
Thus to calculate a value of x? we find the differences between the observed 
and expected values, square these, divide each difference by the appropriate 
» expected value, and sum over all frequencies. 
TABLE 11.1 


CALCULATION OF x? IN COMPARING OBSERVED AND EXPECTED FREQUENCIES FOR 
300 Turows or A DIE 


Value of | Observed | Expected 
die frequency | frequency | O — E |(O — E)? 
о 


Table 11.1 illustrates the calculation of x? іп comparing the observed and 
expected frequencies for 300 throws of a die. Note that the sum of both the 
observed and expected frequencies is equal to Л; that is, ХО = ХЕ = N. 
The value of x* obtained in Table 11.1 is 8.72. "This is a measure of the 
discrepancy between the observed and theoretical frequencies. If the 
discrepancy is large, x? is large. If the discrepancy is small, x? is small. 
Does a value of x? = 8.72 constitute evidence at an accepted level of sig- 
nificance for rejecting the null hypothesis? The answer to this question 
demands a consideration of the sampling distribution of x?. 


11.2. The Sampling Distribution of Chi Square 


The sampling distribution of x? may be illustrated with reference to the 
tossing of coins. Let us assume that in tossing 100 unbiased coins 46 heads 


Sec. 11.2] Chi Square 159 


and 54 tails result. The expected frequencies are 50 heads and 50 tails. А 
value of x? may be calculated as follows: 


(0 — Е)? 


In the tossing of 100 coins two frequencies are obtained, one for heads and 
one for tails. These frequencies are not independent. If the frequency of 
heads is 46, the frequency of tails is 100 — 46 = 54. If the frequency of 
heads is 62, the frequency of tails is 100 — 62 = 38. Quite clearly, given 
either frequency, the other is determined. One frequency only is free to 
vary. In this situation 1 degree of freedom is associated with tlie value of x’. 

Let us toss the 100 coins a second time, a third time, and so on, to obtain 
different values of x’. A large number of trials may be made, and a large 
number of values of x? obtained. The frequency distribution of these values 
is an experimental sampling distribution of x? for 1 degree of freedom. It 
describes the variation in x? with repeated sampling. By inspecting this 
experimental sampling distribution estimates may be made of the proportion 
of times, or the probability, that values of x? equal to or greater than any 
given value will occur due to sampling fluctuation for 1 degree of freedom. 
In the present illustration this assumes, of course, that the coins are unbiased. 

Tnstead of tossing 100 coins, let us throw 100 unbiased dice, obtain observed 
and expected frequencies, and calculate a value of x2. In this situation if 
any five frequencies are known, the sixth is determined. Five degrees of 
freedom are associated with the value of x? obtained. The 100 dice may be 
tossed a great many times, a value of x? calculated for each trial, and a 
frequency distribution made. This frequency distribution is an experimental 
sampling distribution of x? for 5 degrees of freedom. 

The theoretical sampling distribution of x? is known, and probabilities 
may be estimated from it without using the elaborate experimental approach 
described for illustrative purposes above. The equation for x? is complex and 
is not given here. It contains the number of degrees of freedom as a variable. 
This means that a different sampling distribution of x? exists for each value 
of df. Figure 11.1 shows different chi-square distributions for different 
values of df. x* is always positive, а circumstance which results. from 
squaring the difference between the observed and expected values. Values 
of x? range from zero to infinity. The right-hand tail of the curve is asymp- 
totic to theabscissa. For 1 degree of freedom the curve is asymptotic to the 
ordinate as well as to the abscissa. 


160 Statistical Analysis in Psychology and Education {Снлр. 11 


The x? distribution is used in tests of significance in much the same way 
that the normal, /, or the F distributions are used. The null hypothesis is 
assumed. This hypothesis states that no actual differences exist between 
the observed and expected frequencies. A value of x? is calculated. If this 
value is equal to or greater than the critical value required for significance at 
an accepted significance level for the appropriate df, the null hypothesis is 
rejected. We may state that the differences between the observed and 
expected frequencies are significant and cannot reasonably be explained by 
sampling fluctuation. Table C in the Appendix shows values of x? required 


me З LE ————LT-. 
1 


0 5 к= — 
9.1.2.3: 4:5 6/7138 9:10 i 12 13 14 15 16 17 18 19 20 
x 


Fic. 11.1. Chi-square distribution and 5 per cent critical regions for various degrees of 
freedom. (From Francis G. Cornell, The essentials of educational statistics, John Wiley & 
Sons, Inc., New York, 1956.) 


for significance at various probability levels for different values of df. The 
critical values at the 5 and 1 per cent levels for df = 1 are, respectively, 
3.84 and 6.64. This means that 5 and 1 per cent of the area of the curve 
fall to the right of ordinates erected at distances 3.84 and 6.64 measured 
along the base line from a zero origin. For df = 5, the corresponding 5 and 
1 per cent critical values are 11.07 and 15.09, 

Table C of the Appendix provides the 5 and 1 per cent critical values for 
df = 1 to df = 30. This covers the great majority of situations ordinarily 
encountered in practice. Situations where a x? is calculated based on a df 
greater than 30 are infrequent. Where df is greater than 30 the expression 
Vax — V/2df — 1 has a sampling distribution which is approximately 
normal. Values of this expression required for significance at the 5 and 1 
per cent levels are 1.64 and 2.33. 


Sec. 11.3] Chi Square 161 


Table C of the Appendix provides, in addition to critical x* values at the 
5 and 1 per cent levels, values at other per cent or probability levels. Values 
are given to the right of which 99, 95, 90, and other percentages of the area of 
the curve lie. For example, for df = 5, 95 per cent of the area of the curve 
falls to the right of x? = 1.14 and 99 per cent to the right of x? = .55. For 
df — 5, a value of x? — 1.14 is just as improbable on the basis of sampling 
fluctuation as a value of x? — 11.07, the critical value at the 5 per cent level. 
Very close agreement between observed and expected values may be a highly 
improbable event. Where an improbably small value of x? is obtained, either 
the data or the calculation is suspect and should be subject to careful scrutiny. 


11.3. Goodness of Fit 


Numerous examples may be found to illustrate the goodness of fit of a 
theoretical to an observed frequency distribution. In one experiment Abbé 
Mendel observed the shape and color of peas in a sample of plants and 
reported the distribution shown in Table 11.2. According to his genetic 


TABLE 11.2 
COMPARISON OF OBSERVED AND EXPECTED FREQUENCIES IN SHAPE AND COLOR OF 
Pras IN EXPERIMENT BY MENDEL 


-py 

о Е (0 — Е) | (0 — Е)? © Е ) 
Round yellow................| 315 312.75 2.25 5.06 .016 
Round green..............,..| 108 104.25 3.75 14.06 „135 
Angular yellow...............| 101 104.25 —3.25 10.56 ‚101 
Angular green........-.+++-- 32 34.75 —2.75 7.56 .218 


theory the expected frequencies should follow the ratio 9:3:3:1. The 
correspondence between observed and expected frequencies is close. The 
value of x? = .470, and no grounds exist for rejecting the null hypothesis. 
The data lend confirmation to the theory. The value of x* is smaller than 
we should ordinarily expect, the probability associated with it being between 
.90 and .95. Assuming the null hypothesis, a fit as good or better than the 
one observed may be expected to occur in between 5 and 10 per cent of 
samples of the same size. 

In testing goodness of fit the hypothesis may be entertained that the 
distribution of a variable conforms to some widely known distribution such 
as the binomial or normal distribution. Johnson (1949), in order to illus- 
trate the goodness of fit of the theoretical binomial distribution to an observed 
distribution, tossed 10 coins 512 times and recorded the proportion of tails. 


162 Statistical Analysis in Psychology and Education [СнАР. 11 


TABLE 11.3 
Соормеѕѕ or Fir or BINOMIAL DISTRIBUTION TO OBSERVED DISTRIBUTION OF 
PROPORTION OF Tarts FROM 512 TossEs or 10 Coiws* 


Proportion 
of tails 


SSeS ЕР 
I о 


299999 
© кю ш >л 


* Adapted from Palmer Johnson, Statistical methods in research, Prentice-Hall, Inc., 
Englewood Cliffs, N.J., 1949. 


His data are shown in Table 11.3, together with the corresponding theoretical 
binomial frequencies. The mean and standard deviation of the observed 
distribution are X — 0.5 and s — .162. The mean and standard deviation 
of the theoretical binomial are X — 0.5 and s — .156. 

Note that in the calculation of x? for these data, the small frequencies at 
the tails of the distributions are combined, a procedure that is generally 
advisable with data of this type. Problems in the application of chi square 
resulting from the presence of small frequencies are discussed later in this 
chapter. With the present data, combining small frequencies reduces the 
number of frequencies from 11 to 9 and the number of degrees of freedom 
from 10 to 8. The value of x? for these data is 9.55. The value required 
for significance at the 5 per cent level for 8 degrees of freedom is 15.51. The 
conclusion is that the evidence is insufficient to justify rejection of the null 
hypothesis. Reference to a table of x? shows that a value of х? equal to or 
greater than the one observed might be expected to occur in about 30 per 
cent of samples due to sampling fluctuation alone. 

Where the theoretical frequency distribution is continuous, we require a 
method for the estimation of the theoretical frequencies. In fitting a con- 
tinuous curve we calculate the proportion of the area under the theoretical 
curve corresponding to each class interval. This proportion multiplied by 
N is taken as the theoretical frequency within the class interval. This 
procedure is illustrated in Table 11.4, The data are adapted from McNemar 
(1955) and are Stanford-Binet IQ's, Form M, for a sample of 2,970 individuals. 


Sec. 11.3] Chi Square 163 


We are required to calculate the theoretical normal frequencies for the 
class intervals and test the goodness of fit between the theoretical and the 
observed. The mean and standard deviation are X = 104.56 and s = 16.99. 
A normal frequency distribution is required with the same N, X, and s as 
the observed distribution. We proceed by combining the small frequencies 
at the tails of the distribution, as shown in col. 2. This reduces the number 
of frequencies from 14 to 11. The frequency of 16 at the top of the distri- 
bution contains all cases above the exact limit 149.5. The frequency of 12 


Taste 11.4 
CALCULATION OF NORMAL DISTRIBUTION FREQUENCIES FOR 
SrANroRD-BiNET IQ's, Form M* 


(1) (3) (4) (5) (6) (7) (8) 
Deviation Р ^ 
Class Upper Proportion | Proportion | Expected 
interval g limit from mean | #/s below within frequency 
160- з} і 
150-159 13 .0041 12 
140-149 55 149.5 44.94 2.645 ‚9959 ‚0158 47 
130-139 | 120 139.5 34.94 2.057, .9801 .0512 152 
120-129 | 330 129.5 24.94 1.468 .9289 .1186 352 
110-119 610 119.5 14.94 .879 .8103 .1958 582 
100-109 719 109.5 4.94 .291 ‚6145 ‚2316 688 
90-99 592 99.5 —5.06 — .298 ‚3829 ‚1950 579 
80—89 338 89.5 —15.06 — .886 .1879 MT 350 
70-79 130 79.5 | —25.06 |—1.475| .0702 ‚0506 150 
60—69 48 69.5 | —25.06 |—2.06] .0196 ‚0155 46 
50-59 7 59.5 | —45.06 |—2.652) .0041 ‚0041 12 
40-49 4712 
30-39 1 
Total... | 20105, EUST EC 1215 NEU 1.0000 2,970 


елм ee ee эн ым ee 
* Adapted from Quinn McNemar, Psychological statistics, John Wiley & Sons, Inc., New 
York, 1955. 


at the bottom of the distribution contains all cases below the exact limit 
59.5. We next record the exact upper limits of the class intervals (col. 3), 
convert these to deviations from the mean of 104.56 (col. 4), and divide by 
the standard deviation 16.99 to obtain the standard score x/s (col. 5). Thus 
the exact upper limits of the class intervals are expressed in standard measure. 
For example, the exact upper limit of the interval 140 to 149 is 149.5. This 
as а deviation from the mean is 149.5 — 104.56 = 44.94. Dividing this by 
the standard deviation 16.99, we obtain 44.94/16.99 — 2.645. We then 
consult a table of areas under the normal curve and ascertain the proportions 


164 Slalislical Analysis in Psychology and Education (Снхлр. 11 


of the area under the normal curve falling below the standard-score values 
x/s of col. 5. These proportions are shown in col. 6. We observe that a 
proportion .9959 of the area of the normal curve falls below 2.645 standard 
deviation units above the mean, a proportion .9801 falls below 2.057 units 
above the mean, and so on. Ву subtraction we obtain the proportions of the 
area of the normal curve falling within the class intervals (col. 7). The 
proportion above the exact limit 149.5 is 1.0000 — .9959 — .0041. The 
proportion between 139.5 and 149.5 is .9959 — .9801 = .0158. The propor- 
tion between 129.5 and 139.5 is .9801 — .9289 — .0512, and so on. By 
multiplying these proportions by № we obtain the expected frequencies 
(col. 8). 

The above method simply involves converting the exact limits of the class 
intervals to standard deviation units, using a table of areas under the normal 
curve to find the proportion of the area within these limits and multiplying 
these proportions by N to obtain the expected frequencies. 


TABLE 11.5 
GOODNESS or Fit оғ NORMAL FREQUENCIES TO FREQUENCY DISTRIBUTION OF 
STANFORD-BINET IQ's, Form M* 


— 
Class = (0 — E 
interval e А ges ЕЕ, 
160- 3 16 
150-159 ij 12 4 1.33 
140-149 55 47 8 1.36 
130-139 120 152 —32 6.74 
120-129 330 352 —22 1.38 
110-119 610 582 28 1.35 
100-109 719 688 31 1.40 
90-99 592 579 13 ‚29 
80-89 338 350 —12 41 
70-79 130 150 —20 2.67 
60-69 48 46 | 2 09 
50-59 7 12 0 00 
40-49 4112 
30-39 1 | | 
Total.....| 2,970 2,970 0| x-170 


* Adapted from Quinn McNemar, Psychological statistics, John Wiley & Sons, Inc., New 
York, 1955. 


Table 11.5 shows the calculation of x* in comparing the observed with 
expected frequencies. A value of х? = 17.02 is obtained. In this case the 
number of df is 11 — 3 = 8. Although there are 11 frequencies, 8 only are 
free to vary. The loss of 3 degrees of freedom results because the observed 


Sec. 11.4] Chi Square 165 


and expected distributions are made to agree on NV, X,ands. For df = 8, 
the value of x? required for significance at the 5 per cent level is 15.51 and at 
the 1 per cent level 20.09. The obtained x? falls between these two values at 
about the 3 percent level. Thus the chances are 3 in 100 that a fit as good or 
worse than the one observed would result in random sampling from a normal 
population. This establishes grounds for rejecting the hypothesis that the 
distribution of Stanford-Binet IQ's, Form M, is normal. The departures 
from normality are, however, not gross. 

Chi square may be used to test the representativeness of a sample where 
certain population values are known. This in effect is a test of goodness of 
fit. To illustrate, in a study of attitudes toward immigrants, a sample of 
200 cases is drawn from the city of Montreal. The observed frequencies 
and percentages by racial origin are shown in Table 11.6. 


TABLE 11.6 
APPLICATION OF x? IN COMPARING SAMPLE FREQUENCIES OF 
RACIAL ORIGIN WITH POPULATION FREQUENCIES 


Racial Sample, | Population, 
do E o 
origin per cent per cent E 
РгепсВ..„. 5% 47.5 62.5 —30 7.20 
English, s. sardea 33.5 19.4 39 28 20.10 
ОғєВет. 1... 19.0 18.1 36 2 11 


100.0 100.0 


'The population percentages obtained from census returns are also shown. 
These population percentages are used to obtain the expected, or theoretical, 
frequencies. The value of x? is 27.41. For df = 2 this is highly significant, 
the value required for significance at the 1 per cent level being 9.21. We may 
conclude that the sample is biased and cannot be considered a random sample 
with respect to racial origin. Since attitudes toward immigrants may be 
linked to racial origin, results obtained on this sample may be highly ques- 
tionable unless a correction is applied to adjust for the sample bias. 


11.4. Tests of Independence 


A frequent application of chi square occurs where the data are comprised 
of paired observations on two nominal variables. We wish to know whether 
the variables are independent of each other or associated. To illustrate, 
Table 11.7 presents data collected by Woo (1928) on the relationship between 
eyedness and handedness in a sample of 413 subjects. Subjects were tested 
for eyedness and handedness and grouped in one of three categories on both 
variables. Paired observations were available for each subject. One subject 


166 Stalistical Analysis in Psychology and Education |Снлр. 11 


was left-handed and ambiocular, another right-handed and right-eyed, and 
so on. The paired observations were entered in a bivariate frequency table 
as shown in Table 11.7. Such tables are analogous to correlation tables. 


TABLE 11.7 
CONTINGENCY TABLE SHOWING RELATIONSHIP BETWEEN EYE AND HAND LATERALITY 
FOR 413 SUBJECTS AND CALCULATION OF EXPECTED VALUES 


| Left-eyed | Ambiocular | Right-eyed | Total 
| m FW 
| 
Left-handed........... 34 62 28 124 
(35.4) (58.5) | (30.0) 
Ambidextrous......... 27 28 20 75 
(21.4) (35.4) (18.2) 
Right-handed......... 57 105 52 214 
(61.1) (101.0) (51.8) 
Total e ve аге 118 195 100 413 


Calculation of expected values: 


124 х 118 _ 124 X 195 _ 124 x 100 — 
“йз "3544 Au =55 = 300 
75 X118 — 75 Х 195 — 75 X195 — 
m ena 23-354 dig = 182 
214 X 118 _ 214 X 195 _ 214 X 100 _ 
йз 7061 4-110 7-51 


They are used to study the independence or association of the two variables. 
Tables of this kind are spoken of as contingency tables. With such tables 
chi square provides an appropriate test of independence. 

In applying chi square to a contingency table to test independence, the 
expected cell frequencies are derived from the data. The expected cell 
frequencies are those we should expect to obtain if the two variables were 
independent of each other, given the marginal totals of the rows and columns. 
Chi square provides a measure of the discrepancy between the observed cell 
frequencies and those expected on the basis of independence. 1 the value of 
chi square is considered significant at some accepted level, usually either the 
5 or the 1 per cent level, we reject the null hypothesis that no difference 
exists between the observed and expected values. We then accept the 
alternative hypothesis that the two variables are associated. 

How are the expected cell frequencies calculated? The marginal totals 
to the right of Table 11.7 show that 124 subjects were left-handed, 75 
ambidextrous, and 214 right-handed. The proportions in these three cate- 
gories are 11$, үүн, and 41%. These proportions are the probabilities that 
an individual, selected at random from the sample of 413 individuals, is 


Sec. 11.4] Chi Square 167 


left-handed, ambidextrous, or right-handed. The marginal totals at the 
bottom of Table 11.7 show that 118 subjects are left-eyed, 195 ambiocular, 
and 100 right-eyed. The proportions in these three categories are 114, 
128 and 12$. These are the probabilities that an individual is left-eyed, 
ambiocular, or right-eyed. Assuming the independence of the two variables, 
what are the expected probabilities associated with the joint events, or what 
is the expected proportion of left-eyed people who are left-handed, of left- 
eyed people who are ambiocular, and so оп? The multiplication theorem of 
probability states that the joint occurrence of two or more mulually inde- 
pendent events is the product of their separate probabilities. The joint 
probabilities are obtained, therefore, by multiplying the probabilities 
obtained from the marginal totals. The probability that any individual, 
selected at random from the 413 individuals, is left-handed is 11$. The 
probability that any individual is left-eyed is H$. If handedness and eyed- 
ness are independent, the probability that any individual is both left-eyed 
and left-handed is the product of the separate probabilities, or 11$ X FH. 
This is the expected proportion in the top left-hand cell in Table 11.7. We 
require, however, not the expected proportion, but the expected frequency. 
This is obtained by multiplying the expected proportion by N, in this case 413. 
Thus the expected frequency is (213 X 118413 = (124 X 118)/413 = 35.4. 
We observe that for computational purposes the expected cell frequency is 
obtained by multiplying together the first row and column totals and 
dividing by X. Similarly, the other expected cell frequencies may be 
obtained. The expected frequency of left-handed ambiocular individuals is 
(124 x 195)/413 = 58.5, of left-handed right-eyed individuals (124 X 100)/ 
413 = 30.0 and so оп. The expected cell frequencies are shown in brackets 
in Table 11.7. 

If eye and hand laterality are independent of each other, the 124 observa- 
tions in the first row of Table 11.7 will be distributed in the three cells in that 
row in a manner proportional to the column sums.. The expected values 
35.4, 58.5, and 30.0 are proportional to the column sums 118, 195, and 100. 
Likewise, the 118 cases in the first column will be distributed in the three 
cells in that column іп a manner proportional to the row sums. The expected 
values 35.4, 21.4, and 61.1 are proportional to the row sums 124, 75, and 214. 
A similar proportionality exists throughout the table. The expected cell 
frequencies in the rows and columns of any contingency table are propor- 
tional to the marginal totals. 

The calculation of x2 for a contingency table is similar to that for tests of 
goodness of fit. The difference between each observed and expected value 
is squared and divided by the expected value, obtaining (О — E)*/E. These 
values are summed over all cells to obtain x*. The calculation is perhaps 
most readily accomplished by writing the data in columnar fashion as shown 
in Table 11.8. The value of x? obtained is 4.021. The number of degrees of 


168 Slalislical Analysis in Psychology and Education [Снлр. 11 


freedom associated with this value of x? is 4. The value of x? required 
for significance at the 5 per cent level is 9.488. Wehave therefore no grounds 
for rejecting the hypothesis of independence between eye and hand laterality. 
Apparently there is no relationship between these two variables. 


TABLE 11.8 
CALCULATION OF x? FoR Data or TABLE 11.7 


— 2 
0 E (0 — E) | (0 — Е)? e n 
34 35.4 —1.4 ‚055 
62 58.5 3.5 .209 
28 30.0 —2.0 .133 
27 21.4 5.6 1.465 
28 35.4 —1.4 1.547 
20 18.2 1.8 .178 
57 61.1 —4.1 ‚275 
105 101.0 4.0 .158 

‚2 


How is the number of degrees of freedom calculated? In testing inde- 
pendence in any contingency table comprised of R rows and C columns the 
number of degrees of freedom is given by (R — 1)(C — 1). For Table 11.7 
R=3andC = 3. Thenumber of degrees of freedom is (3 — 1)(3 — 1) = 4. 
Fora table comprised of two rows and two columns, referred to asa 2 X 2, or 
fourfold, table, the number of degrees of freedom is (2 — 1)(2 — 1) — 1. 
Consider for explanatory purposes the 2 X 2 table: 


30 70 100 


Given the restrictions of the marginal totals, if one cell value is known, the 
remaining three values are determined. Thus if we know that the value in 
the top left cell is 25, the top right cell must be 60 — 25 = 35, the bottom left 
30 — 25 = 5, and the bottom right 40 — 5 = 35. If one cell value is known, 
no freedom of variation remains. One degree of freedom only is associated 
with the variation in the data. Similarly, in Table 11.7 only four cell values 
are free to vary. Given the marginal totals and four cell values, the remain- 
ing cell values are determined. 

A frequently occurring type of contingency table is the 2 X 2, or fourfold, 
contingency table. A x? test for independence can be readily obtained for 


Sec. 11.5] Chi Square 169 


such a table without calculating the expected values. Let us represent the 
cell and marginal frequencies by the following notation: 


A B А+В 
с р C+D 
A+C B+D N 


Chi square may then be calculated by the formula 


M N(AD— BC)? | 
X = (A+ BC+ D(A + С)(В + D) 


Note that the term in the numerator, AD — BC, is simply the difference 
between the two cross products and the term in the denominator is the 
product of the four marginal totals. 

Consider the following 2 X 2 table showing the relationship between 
ratings of successful or unsuccessful on a job and pass or fail on an ability-test 
item. 


(11.2) 


Test item 
Fail Pass 

ы 

iS 

ri Successful 

Б 

2 i 

£, Unsuccessful 


Is there an association between performance on the job and performance on 
the test item? Does the item differentiate significantly between the suc- 
cessful and unsuccessful individuals? Chi square is as follows: 


_ 100(20 X 15 — 40 X 25)? _ 825 
~ — 60 X 40 X 45 X 55 ; 


2 


For df = 1, a x? = 8.25 is significant at better than the 1 per cent level. 
The data provide fairly conclusive evidence that the test item differentiates 
between individuals on the basis of their job performance. 


11.5. The Application of x? in Testing the Significance of the 
Difference between Proportions 


In Chap. 10 procedures were described for testing the significance of the 
difference between both independent and correlated proportions. These 
procedures involved dividing the difference between two proportions by the 
standard error of the difference to obtain a normal deviate which could be 


170 Stalistical Analysis in Psychology and Education [Снар. 11 


referred to a table of areas under the normal curve. Because of a simple 
relationship for 1 degree of freedom between x? and the normal deviate, x? 
provides an alternative but equivalent procedure for testing the significance 
of the difference between proportions. For 1 degree of freedom it may 
be shown that x?is equal to the normal deviate squared. "Thus x? — (x/s)? — 
z? or ух? = 2. 

We shall now consider the use of x? in testing the significance of the 
difference between proportions for independent samples. Let the following 
be data obtained in response to an attitude-test statement for a group of 
males and females: 


Frequency У Proportion 
Agree Disagree Agree Disagree 
Males 70 70 140 Males 1.00 
Females 20 40 60 Females 1.00 
90 110 200 .450 .550 1.00 


The number of males and females аге №; = 140 and № = 60, respectively. 
'The proportions of males and females indicating agreement to the attitude 
statement are фу = 443; = .500 and p: = $9 = .333. Is there a significant 
difference in the attitudes of males and females? To apply the method 
previously described we calculate a proportion р based on a combination of 
data for the two samples. With the above data 


q=1-— p= 1- 450 = .550 
The required normal deviate is then 


iul pi ps E .500 — .333 
V/bd 1/3) + (0/N9] 7.450 X .550 (ds + у) 


The difference between the two proportions falls between the 5 and 1 per cent 
levels. Reference to a table of areas under the normal curve shows that the 
proportion of the area falling beyond plus and minus 2.172 standard deviation 
units from the mean is close to .03. The difference may be taken as signifi- 
cant at about the 3 per cent level. Let us now apply the formula for calcu- 
lating x? for a 2 X 2 contingency table to the same data. We have 


ym N(AD — ВС)? 
X = AF BC+ DA + OB + D) 
» 200(70 X 40 — 70 X 20)? 


= 140 x 0 X90 x 110 ^41 


= 2.172 


Sec. 11.6] Chi Square 171 


Consulting a table of x? with 1 degree of freedom, we observe that the propor- 
tion of the area in the tail of the distribution of x? is about .03 and the dif- 
ference between proportions may be said to be significant at about the 3 per 
cent level. We observe also that x? = (2.172)? = 4.717. The two pro- 
cedures for testing the significance of the difference between proportions for 
independent samples lead to identical results. From a computational 
viewpoint the x? test is the more convenient. Considerations pertaining to 
small frequencies apply also to the application of x? in testing the significance 
of the difference between proportions (Sec. 11.6). 

Where the data are correlated and are composed of paired observations 
the normal deviate for testing the significance of the differences between 
proportions is given (Sec. 10.12) by the formula s = (D — 4)/V/4 +D, 
where D and A are cell frequencies in the bottom right and top left cells, 
respectively, of a 2X 2 table. Instead of calculating a critical ratio and 
referring this to the normal curve, we may calculate x? by the formula 


sz 
x AFD 


For the data shown in Sec. 10.12, where we wish to test the significance of the 
differences between proportions of agreements to an attitude question for the 
same individuals tested on two occasions, we obtain a z = 3.16. The dif- 
ference is significant at better than the 1 per cent level. The value of the 
probability is .0016. The value of x* calculated on the same data is (3.16)?, 
or 9.986. The probability is the same as before. 


(11.3) 


11.6. Small Expected Frequencies 


The distribution of x? used in determining critical significance values is a 
continuous theoretical frequency curve. Where the expected frequencies 
are small, the actual sampling distribution of x! may exhibit marked dis- 
continuity. The continuous curve may provide a poor fit to the data, and 
appreciable error may occur in the estimation of probabilities, these being 
areas under the continuous x? curve. The situation here is analogous to that 
found in using the normal curve as a fit to the binomial. For small values 
of М the continuous normal curve is a poor fit to the discrete binomial. 

For 1 degree of freedom a correction may be applied known as Yales's 
correction for continuity. To apply this correction we reduce by .5 the 
obtained frequencies that are greater than expectation and increase by .5 
the obtained frequencies that are less than expectation. This brings the 
observed and expected values closer together and decreases the value of 
x2. This correction should be used where any of the expected frequencies is 
less than 5, and some writers suggest 10. For large expected frequencies the 
correction will be negligible. 


172 Slalislical Analysis in Psychology and Educalion [Снхлр. 11 


The formula used in computing x? from a 2 X 2 table can be written to 
incorporate Yates's correction for continuity. This formula becomes 


b N(|AD — BC| — N/2)? 
X — a BC+ DA + COB + D) 


The term |4D — BC| is the absolute difference, that is, the difference taken 
regardless of sign. The correction amounts to subtracting N/2 from this 
absolute difference. 

'The following data show the relationship between sociometric choices 
for a group of 20 Protestant and Jewish school children. 


(11.4) 


Chosen 
Protestant Jewish 


5 Jewish 3 5 8 

E 

© Protestant 10 2 12 
13 7 20 


The value of x? using Yates’s correction is 


_ 20(16 — 50] — 38)? _ 

= SX TX AS 
This value falls at about the 10 per cent level. The evidence does not 
justify the rejection of the hypothesis that sociometric choice is independent 
of whether the child is Jewish or Protestant. We note that if x? is calculated 
on these data without Yates's correction, a value x? = 4.43 is obtained. This 
value falls at about the 3.5 per cent level. If Yates's correction had not been 
used, the result would be considered significant at better than the 5 per cent 
level. 

With 2 or more degrees of freedom the error introduced by small expected 
frequencies is of less consequence than with 1 degree of freedom. An expecta- 
tion of not less than 2 in each cell will permit the estimation of roughly 
approximate probabilities. If the frequencies are 5 or more, good approxima- 
tions to the exact probabilities are obtained. With certain types of data it is 
à common practice to combine frequencies. In testing the goodness of fit 
of a theoretical to an observed frequency distribution, small frequencies at 
the tails may be combined. On occasion it may be possible without serious 
distortion of the data to combine rows and columns of a contingency table to 
increase the expected cell frequencies. 

With 1 degree of freedom where the expected frequencies are small, an 
exact test of significance may be applied. This involves the determination 
of exact probabilities, as distinct from those estimated from the continuous 
x? curve. An exact test of significance for a 2 X 2 table is described below. 


Sec. 11.7] Chi Square 173 
11.7. Exact Test of Significance for a 2 X 2 Table 


An exact test of significance for a 2 X 2 table has been developed by 
R. A. Fisher. This test enables the calculation of exact probabilities and 
avoids the use of the continuous chi-square distribution to obtain approxi- 
mate probabilities. It may be used appropriately where the expected cell 
frequencies are small. The principal objection to its use is the laborious 
calculation required. 

In tossing a number of coins a finite number of events may result. In 
tossing six coins, seven outcomes are possible. We may toss 0, 1,2, 3, 4,5, 
or 6 heads. The binomial distribution may be used to determine the exact 
probabilities associated with these seven outcomes. Similarly, for any 
2 X 2 table, given the restrictions imposed by the marginal totals, a finite 
number of arrangements of the cell frequencies may result. For example, for 
the table 


only four arrangements of the cell frequencies are possible. These are as 
follows: 


(1) (2) (4) 
0 SES 1 Ун ea зо 
5 з |8 4 4 |8 221006. wks 
$ 6 5 589585 


Тһе exact probability associated with each arrangement may be calculated. 
To conceptualize the situation here, consider an urn containing three black 
and eight white balls. Withdraw the balls one at a time and assign five of 
them at random to a black box and six to a white box. Count the number of 
black balls in the black box. Repeat the experiment many times and calcu- 
late the relative frequencies of the four possible outcomes. These relative 
frequencies are experimentally determined estimates of the probabilities of 
occurrence of the four possible 2 X 2 tables. The required probabilities may 
be calculated without this laborious experimental procedure. The proba- 
bility of any arrangement of cell frequencies, given the marginal restrictions, 
is obtained by 

(A + В)! (C + D)! (4 + C)! (B + D)! (11.5) 
нч ТАТ BICI DI 


The numerator is the product of the factorials of the marginal totals. The 


174 Statistical Analysis in Psychology and Education [Снлр. 11 


denominator is N! times the product of the factorials of the cell frequencies. 
The factorial of any number, say, 5, is 5X 4X 3 X 2 X 1 = 120; also 
0! = 1. The probabilities associated with the four tables above are 


3181 5161! 4 


(1) h: = roris = 33 = 2? 
(2) h- pa = 15 445 
з=н a = 6 
(9 $«7 тзг?” 35 7 2S 

ЕТАДИ АРИ = .9999 


Clearly, in this case we have no grounds for rejecting the hypothesis that 
the two variables are independent. The probability of obtaining a degree 
of association equal to or better than the one observed, and in the same 
direction, is obtained by summing the probabilities of arrangements 3 and 4. 
This probability is .3636 + .0606 = .4242. Thus in about 42 samples in 
100, a result equal to or better than the one observed would occur by chance. 
With the present data no arrangement of the 2 X 2 table can lead to a 
statistically significant result. 

Usually the probabilities associated with all possible arrangements of the 
2 X 2 table need not be calculated. We need only calculate the probabilities 
associated with the observed table and those that represent more extreme 
departures from expectation in the same direction. Let table 1 below repre- 
sent the observed data. Tables 2 and 3 are the two more extreme tables in 
the same direction. 


(1) (2) (3) 
6 6 0 6 
8 8 8 
7 7 14 7 7 14 7 7 м 


The probabilities associated with these three tables are .2448, .0490, and 
0023. The sum of these probabilities is .2961. This falls far short of 
significance, and we conclude that the evidence is insufficient to warrant 
rejection of the hypothesis of independence. The sum of the probabilities 
associated with tables 2 and 3 is .0513. Thus the arrangement of table 2 
above, if it did occur, would fall short of significance at the 5 per cent level 
for a one-tailed test. The only arrangement of the three shown which could 
lead to a conclusion of significance, given the marginal restrictions, is that 
shown in table 3. 

Tables to assist the application of exact tests of significance to 2 X 2 tables 
have been prepared by Finney (1948). An adaptation of Finney’s tables is 
given by Siegel (1956). 


Sec. 11.8] Chi Square 175 
11.8. Miscellaneous Observations on Chi Square 


In this section we shall consider a number of miscellaneous points about 
x? not hitherto discussed. 

One-tailed and two-tailed tests. Tables of x? used for tests of significance 
are based on one tail only, the tail to the right, of the sampling distribution of 
x*. Table C of the Appendix shows that for 1 degree of freedom 5 per cent 
of the area of the distribution falls to the right of x? = 3.84 and 1 per cent to 
the right of x? = 6.64. These are not critical values for directional, or one- 
tailed, tests as described in Chap. 10. Although one tail only of the sampling 
distribution of x? is used, the tabled values are those required for testing the 
significance of a difference regardless of direction, that is, for two-tailed tests. 
The critical ratio or normal deviate required for significance at the 5 per cent 
level for a two-tailed test is 1.96. If this value is squared, we obtain 3.84, 
the x? value at the 5 per cent level for 1 degree of freedom. For 1 degree of 
freedom the square root of x? is a normal deviate and may be used with 
reference to the normal curve in applying two-tailed tests. In effect, because 
x? is the square of the normal deviate for 1 degree of freedom, both tails of 
the normal curve are incorporated in the right tail of the x? curve. In many 
situations where x? is applied, the idea of a directional, or one-tailed, test 
has little meaning. In tests of goodness of fit and in most tests of inde- 
pendence we are usually not concerned with the direction of the difference 
observed. If a one-tailed test is required, the proportionate areas in the 
chi-square tables should be halved. The value of x? required for significance 
at the 5 per cent level for a one-tailed test is 2.71 for df = 1. The corre- 
sponding value at the 1 per cent level is 5.41. These are the squares of the 
normal deviates 1.64 and 2.33 required for significance for a one-tailed test 
at the 5 and 1 per cent levels, respectively. 

Chi square and sample size. The value of x? is related to the size of the 
sample. If an actual difference exists between observed and expected values, 
this difference will tend to increase as sample size increases. x? will also 
increase, and the associated probability value will decrease. Consider the 
following tables: 


(2) 


12 8 | 20 
8 22 | 30 
20 30 50 
x? = 2.78 x? = 5.56 х? = 11.12 


As the samples are doubled in size from 25 to 50 to 100, the differences 
between the observed and expected values, О — E, are doubled and the x? 
values are doubled. If no actual difference exists between observed and 


176 Slalislical Analysis in Psychology and Education (Снар. 11 


expected values, x? will tend to remain unchanged as sample size increases. 
For a constant difference between observed and expected values x^ will 
decrease as sample size increases. If we double sample size and hold the 
difference between observed and expected values fixed, the value of x? will be 
reduced by one-half. 

Alternative formula for chi square. We can readily demonstrate that 


m 3 Бы «4 m dis (11.6) 


This alternative way of writing x? is sometimes useful for computational 
purposes. 

2 X 2 tables with more than 1 degree of freedom. For most 2 X 2 tables the 
row and column totals are considered fixed and 1 degree of freedom is 
associated with the variation in the data. Situations arise where either the 
row or column totals, or both, are free to vary. In a sociometric study on a 
class of 8 Jewish and 12 Protestant children, each child may be asked to 
choose one other child with whom he would prefer to play. Не cannot choose 
himself. If choices are independent of whether a child is Jewish or Protes- 
tant, what are the expected frequencies of choices? On a strictly random 
basis, how many Jewish choosers will make Protestant choices, and so on? 
Since a child cannot choose himself, a Jewish child chooses from among 
seven Jewish and 12 Protestant children. The probability of a Jewish child 
choosing a Jewish child in making a choice at random is ту and of choosing а 
Protestant child 42. Since eight choices are made, the expected frequency of 
Jewish choices is үу X 8 = 2.95. The expected frequency of Protestant 
choices is 1$ X 8 — 5.05. Likewise, we find that the expected frequency of 
Jewish choices by Protestant children is 45; X 12 — 5.05, and the expected 
frequency of Protestant choices is ¥ X 12 = 6.95. The expected fre- 
quencies are tabulated below together with the observed frequencies. 


Expected Observed 
chosen chosen 


5.05 | 2.95 


Chooser 


5.05 


12 8 20 


In this example the row totals are fixed. The column totals are free to 
vary from expectation. In these data we observe a tendency for both Jewish 
and Protestant children to choose Protestant children more frequently than 
expectation. In this case a x? based on a comparison of the expected and 
observed cell frequencies has 2 degrees of freedom. x? may of course be 
applied to the observed frequencies in the usual way with 1 degree of freedom. 
This is a test of association, within the restrictions of the marginal totals, of 


Sec. 11.8] Chi Square 77 


the religious affiliation of the choosers and the chosen. It is not a test of 
randomness of choice. Fourfold tables may occur where both row and 
column totals are free to vary. Such tables arise where all expected fre- 
quencies are derived in a manner entirely independent of the data. x* here 
has 3 degrees of freedom. 

Reduction of an К X C table to a 2 X 2 table. A table with R rows and 
C columns may be reduced to a 2 X 2 table in order to facilitate a rapid test 
of association with x*. This procedure is legitimate enough provided the 
points of dichotomy of the two variables are made without reference to the 
cell frequencies. The investigator may decide a priori to dichotomize about 
the two medians, or something of thesort. Data are found where the points 
of dichotomy have been located in order to maximize the association in the 
data and obtain thereby a significant x*. This practice is spurious and 
should be enthusiastically discouraged. 


EXERCISES 


1. In 180 throws of a die the observed frequencies of the values from 1 to 6 are 34, 27, 41, 
25, 18, and 35. Test the hypothesis that the die is unbiased. 
2. А psychological test yields a distribution of scores as follows: 


Class interval Frequency 
90-99 1 
80-89 5 
70-79 17 
60—69 30 
50—59 50 
40-49 35 
30-39 10 
20-29 6 
10-19 4 

0-9 Be! 
160 


Obtain the theoretical normal frequencies, Test the goodness of fit between the theo- 
retical and the observed frequencies. 

3. How many cell frequencies are free to vary in tables with (а) two rows and two columns, 
(b) two rows and three columns, (c) three rows and five columns? Assume fixed mar- 


ginal totals, : 
4. The following data relate to patients in a mental hospital: 
Rating 
Improvement No improvement 
Therapy 4 28 44 
Therapy B 37 46 
25 65 90 


Test the hypothesis that method of therapy is independent of rating assigned. 


. иә 3. 


178 Slalislical Analysis in Psychology and Education [Cuap. 11 


5. The following contingency table describes the relation between scores above and below 
the median on an examination and ratings of job performance for 100 employees. 


Rating 
Below Above 
average Average average 
Above median 11 25 35 71 
Below median 15 7 7 29 
26 32 42 100 


Test the hypothesis that job performance is independent of examination results. 

6. A sample used in a market survey contains 100 males and 100 females. Of the males 
33 and of the females 18 state a preference for brand A. Use x? to test the hypothesis 
that no sex differences exist in consumer preference. 

7. Calculate x? for the following tables using Yates's correction for continuity: 


(a) Weight 
Increase No increase 
Gentled animals 7 
Ungentled animals 8 
8 7 15 
(b) Locus of lesion 


Impairment in performance 


No impairment in performance 


8. Obtain the exact probabilities associated with all possible arrangements of cell frequen- 
cies for the following 2 X 2 tables: 


In either case would any arrangement of cell frequencies justify a rejection of the hy- 
pothesis of independence? 


CHAPTER 12 


RANK CORRELATION METHODS 


12.1. Introduction 


Ordinal, or rank-order, data may arise in a number of different ways. 
Quantitative measurements may be available, but ranks may be substituted 
to reduce arithmetical labor or to make some desired form of calculation 
possible. For example, measurements of height and weight may be obtained 
for a group of school children. A correlation between the paired measure- 
ments could readily be calculated. The investigator may, however, choose to 
substitute ranks for the measurements and calculate a correlation between 
the paired ranks. In many situations where ranking methods are used, 
quantitative measurements are not available. The measuring operations 
used may be such that no comparative statements about the intervals 
between members are possible. For example, employees may be rank- 
ordered by supervisors on job performance. School children may be ranked 
by teachers on social adjustment. Whiskies may be rank-ordered by expe- 
rienced judges on taste, or participants in a beauty contest may be rank- 
ordered by judges on pulchritude. In such cases the data are comprised of 
sets of ordinal numbers, 1st, 2d, 3d, . . . , Nth. These are replaced by the 
cardinal numbers 1, 2, 3, . . . , N, for purposes of calculation. The sub- 
stitution of cardinal numbers for ordinal numbers always assumes equality 
of intervals. The difference between the 1st and 2d member is assumed 
equal to the difference between the 2d and 3d, and so on. This assumption 
underlies all coefficients of rank correlation. Because of difficulties asso- 
ciated with the measurement of psychological variables, statistical methods 
for handling rank-order data are of particular interest to psychologists. 


12.2. Spearman's Coefficient of Rank Correlation p 


Consider a group of N individuals, 41, 4s, As, . . . , Ам, ranked on two 
variables X and У. The rankings on X may be denoted as X;, Xs, Xs, . . . , 
Ху and on Y as У, Ys, Vs, ..., Ух. A group of five individuals, for 
example, may be ranked 1, 2, 3, 4, 5 on race prejudice and 3, 1, 2, 5, 4 on 
authoritarianism. The data are comprised of paired integers extending from 
1 to N. How may a coefficient of correlation between the ranks be defined? 

179 


180 Slalislical Analysis in Psychology and Education [Снлр.12 


'This problem may be approached by considering the sum of squares of * 
the differences between the paired ranks. Denote this quantity by Xæ’. 
As in many similar situations, we use the sum of squares instead of the sum. 
The sum is equal to zero, What are the minimum and maximum values of | 
zd?? When the members are ranked in the same order on both X and У, 
the case of perfect positive correlation, 2d? = 0 and is a minimum. Thus if 
the ranks on X are 1, 2, 3, 4, 5 and on Y 1, 2, 3, 4, 5, the differences are all 
zero. If the paired ranks are in inverse order, the case of perfect negative 
correlation, Ed? is a maximum. No arrangement of X with respect to Y will 
produce a larger value of Z2. Thus if the ranks on X are 1, 2, 3, 4, 5 and on 
Y 5, 4, 3, 2, 1, the differences d are —4, — 2, 0, 2, 4. The squares d? are 16, 4, 
0, 4, 16 and Xd? = 40. It may be shown that the maximum value of Zd’? is 
given by 
ЈМ = D 


2 
> daar 3 


It may also be shown that the value of Xd? expected when ranks on X are 
independent of ranks on Y is one-half dax, or 


Е(Х й?) = хар 


Coefficients of correlation are conventionally defined to take the values 
+1, 0, and — 1 in the presence of a perfect positive, independent, and perfect 
negative relation, respectively, between the two variables. In the present 
case a measure of rank-order correlation which will meet this requirement 
may be defined as 
2zd* 

Хах? 


p=1- 


where р is the Greek letter rho. For a perfect positive correlation zd? = 0 
and р 1. Fora perfect negative relation 2d? = Edw’ and p= —1. In 
the case of independence, 22d? = dm? and р = 0. By substituting the 
value of Zd,,,? in the above formula, we obtain 


6xd* 
p=1- Ni? 1) (12.1) 
This is Spearman’s coefficient of rank correlation. 
In Chap. 7 we presented the formula for Pearson’s product-moment 
correlation coefficient, 
, ЖХх- = Р) 


Хз, 


Spearman’s p is a particular case of the above formulation. It is the particu- 
lar case which arises where the variables are the first N consecutive untied 


Sec. 12.3] Rank Correlation Methods 181 


integers. If the above formula is applied directly to paired ranks, the result 
is identical with that obtained by applying the formula for p. 


, ТАвгЕ 12.1 
CALCULATION OF SPEARMAN's COEFFICIENT OF RANK CORRELATION 
Rank Difference 
Individual 

X ү d d* 
En 1 6 | 5 25 
Аз 2 31-1 1 
Аз 3 71-4 16 
En 4 2 2 4 
As 5 1 4 16 
Аг 6 8 | -2 4 
Ar 7 4 3 9 
As 8 9|-1 1 
Ay 9 5 + 16 
Ay 10 10 0 0 
Total. .... 0| zd? = 92 


6x92 


попот) en 


The calculation of p is illustrated in Table 12.1. The calculation is simple. 
We find the differences between the paired ranks, square these, sum to 
obtain 74, and then apply the formula for p. 


12.3. Spearman's p with Tied Ranks 


In arranging the members of a group in order, a judge may be unable to 
discriminate between certain members. Where measurements are replaced 
by ranks, certain measurements may be equal. These circumstances give 
rise to tied ranks. If we attempt to replace the numbers 14, 19, 19, 22, 23, 23, 
23, 25 by ranks, we observe immediately that 19 occurs twice and 23 three 
times. Under these circumstances we assign to each member the average 
rank which the tied observations occupy. Thus 14 is ranked 1, the two 
19's are ranked 2.5 and 2.5, the 22 is ranked 4, the three 23's are ranked 6, 6, 
and 6, and 25 is ranked 7. Having replaced the tied ranks by their average 
rank, we proceed as before in the calculation of p. A calculation with tied 
ranks is illustrated in Table 12.2. If the ties are numerous, this type of 
adjustment for tied ranks may not prove altogether satisfactory. 

The development of p from the ordinary product-moment r assumes that 
the ranks are the first N integers. Where tied ranks occur this is not so. 
Where a substantial number of tied ranks is found, the departure of the sum 
of squares of ranks from the sum of squares of the first N integers will be 


182 Slalislical Analysis in Psychology and Education [Cuap. 12 


TABLE 12.2 
CALCULATION OF SPEARMAN's COEFFICIENT OF RANK CORRELATION WITH 
Tien RANKS 
Ss ———— 
Rank Difference 
Individual 
X Y d d 
Ay 1 8 -1 49.00 
Аз 2.5 6.5 ЕС: 16.00 
Аз 2-8 4.5 —2 4.00 
Ay 4.5 2 2:5 6.25 
As 4.5 1 3.5 12.25 
Ag 6 3 3 9.00 
Аз 8 4.5 3x5 12.25 
As 8 6.5 1.5 2.25 
А» 8 9 —1 1.00 
Aw 10 10 0 .00 
Total. .... ROM onde ct Zd* — 112.00 


ENENTIU 
5 10(100 — 1) 


= .321 


appreciable and the value of р will be thereby affected. While other pro- 
cedures (Sec. 12.9) for correcting for ties may be used, one convenient 
approach is to calculate an ordinary product-moment correlation for the 
paired observations where average ranks have been substituted for ties. 


12.4. Testing the Significance of Spearman's p 


The study of the sampling distribution of p is approached by considering all 
possible, and equally probable, arrangements of rankings on V for a fixed 
ranking on X. The model is one where ranks on Y are drawn at random from 
a hat and paired successively against fixed ranks on X. For N = 2, if X has 
the ranks 1, 2, only two arrangements of Y are possible, 1, 2 and 2,1. Only 
two values of p are possible, +1 and —1. For N = 3, if X has the ranks 1, 2, 
3, there are six possible arrangements of Y and, as it turns out, four possible 
values of p, —1, —3, +}, and +1. The sampling distribution of p has been 
studied by Kendall (1943). For small values of N the sampling distribu- 
tion of p is bimodal. For № = 7 or N = 8, the distribution has a somewhat 
jagged or serrated appearance. As N increases in size, the distribution 
seems to approach the normal form. 

Table С of the Appendix shows critical values of р for different values of 
N required for significance at various levels. Observe that for a small N, 
values of p of very substantial size must be obtained before we have adequate 
grounds for rejecting the hypothesis that no association exists between the 
rankings. For N = 10 we require a p equal to or greater than .564 before we 


Sec. 12.5] Rank Correlation Methods 183 


can argue that a significant association exists in a positive direction at the 
5 per cent level. 

With N = 10 or greater we may test the significance of p by using a 4 
given by 


)12.2( چ ړم t=‏ 


This quantity has a / distribution with № — 2 degrees of freedom. For 
example, where № = 10 and р = .564, / = 1.93. For 8 degrees of freedom 
the value of / at the .10 level is 1.86. For a two-tailed test we have insuffi- 
cient grounds for arguing that the observed p is significantly different from 
zero. For a one-tailed test the observed p is significant at about the 5 per 
cent level. 


12.5. Kendall’s Coefficient of Rank Correlation т 


An alternative form of rank correlation т, or tau, has been developed by 
Kendall (1943, 1955). Both Spearman's and Kendall’s coefficients apply to 
the same type of data. The rationale of Kendall's coefficient is of interest. 
On ordering the members of a group a relation is established between every 
member and every other member. These relations are of the kind >, 
greater than, or <, less than, where ties do not occur. То illustrate, let Ay, 
As, Аз, А, be four members ordered with respect to X. This ordering 
permits a comparison of every member with every other member. The six 
relations are A; > As, Ai > Аз, А, > Ag, Аз > Аз, Аз > Ag, and А; > Ау. 
For purposes of this discussion these relations may be regarded as units of 
information resulting from the ordering operation. For М members the 
number of such relations is N(N — 1)/2. Let As, Ai, A4, A; be the ordering 
of the same four members with respect to Y. The six relations generated by 
the Y ordering are Аз > Ai, А» > Ag, Аз > Аз, A41 > As, Ау > Аз, and 
A4 > Аз. How many of the relations between members on X are true also 
of Y? We observe that four of the relations on X are also true with respect 
to Y, and vice versa. Four of the six units of information available on the 
one variable are true also of the other. In predicting a relation on Y from a 
relation on X, two-thirds of such predictions would be correct and one-third 
incorrect. This type of argument can be used as the basis for the definition 
of a coefficient of rank correlation. 

In comparing relations on X with relations on Y the number of agreements 
and disagreements may be counted. А coefficient of rank correlation is 
defined as the number of agreements minus the number of disagreements 
divided by the maximum possible value of this quantity. Denote the number 
of agreements by Ё, disagreements by /, and the difference, k — l, by 5. The 
maximum possible value of S occurs where the pairs of ranks are in the same 


184 Statistical Analysis in Psychology and Education [Cuar. 12 
order and is equal to N(N — 1)/2. The statistic т is then defined as 
k—i S 
"= iNNi) FN 0) 
Where a perfect positive relation exists between the paired ranks, 
S = N(N — 1)/2 


and r = 1. Where a random relation exists between the paired ranks, the 
expectation is that Ё = /, 5 = 0, and r = 0. Where a perfect inverse 
relation exists between the rankings & = 0, S = —N(N — 1)/2, and7 = —1. 
This is Kendall’s coefficient of rank correlation т. The rationale of the 
statistic given here differs somewhat from that given by Kendall. 


(12.3) 


TABLE 12.3 
CALCULATION OF KENDALL'S COEFFICIENT OF RANK CORRELATION 


Agreement | Disagreement 
i»j 
Ay 1 0 
Aa 5 3 
En 2 0 
А, 6 2 
As 7 2 
Ag 3 0 
A1 4 0 
Аз 10 2 
As 11 2 
Aio 10 8 0 
An 11 9 0 
12 0 
"MASS 1 


ате 11» 44 
25 2x 44 
t N(W-i^ixi = 667 


The practical calculation of т is shown in Table 12.3. To effect this 
calculation arrange the individuals on X in their natural order from 1 to N. 
Write down the corresponding paired observations on Y. Compare every 
ranking on Y with the rankings below it, and count the number of times 
each ranking is less than and greater than the rankings below it. These are 
entered in the columns headed i<j and i > j. To illustrate, the top 
ranking is 1. This is less than 11 of the rankings below it and greater than 0; 
hence 11 is entered in the i < j column and 0 in the i > j column. The 
second ranking on У is 5. This is less than 7 of the rankings below it and 


Sec. 12.7] Rank Correlation Methods 185 


greater than 3; hence 7 is entered in the i < j column and 3 in the i > j 
column. The sums of the i<j and i > j columns are the number of 
agreements and disagreements, k and /, respectively. In this example 
k = 55, l = 11, S = 55 — 11 = 44, and т = .667. Alternative formulas 
for calculating т are 

Ak 4 


пр C БЕШ Үү у 


(12.4) 
'The calculation of both & and /, as in Table 12.3, is probably desirable as a 
check. 

Spearman's p and Kendall’s т serve essentially the same purpose. Опе 
may be used as an alternative to the other. When calculated on the same 
data, т will have a numerical value smaller than p. For large №, т will be 
about two-thirds the size of p. The two coefficients maintain a nearly con- 
stant ratio. Difficulties arise in the sampling distribution of p which are 
not present in т, although this hardly seems to be a crucial consideration. 
т has been generalized to partial rank correlation. 


12.6. Kendall’s т with Tied Ranks 


To compute т with tied ranks, assign to each member, as before, the 
average rank which the tied observations occupy. Apply the formula 


TN iS 
= JANN — 1) — TaN — 1) - TJ 


S is as previously defined. The correction factor T, = 32(¢ — 1), where ¢ 
is the number of ties on the X rankings. Similarly, T, = $2¢(¢ — 1) for 
the Y rankings. To illustrate the calculation of the correction factor, let 
the X rankings be 1, 2.5, 2.5, 4.5, 4.5, 6, 8, 8, 8, 10. Here there are three 
groups of ties, two of two rankings each and one of three rankings. The cor- 
rection factor in this case is 7, = 1(2(2 — 1) + 2(2 — 1) + 3(3 — 1)] = 5. 


(12.5) 


12.7. The Significance of Kendall’s т 


The standard deviation of the sampling distribution of т in the case of 


independence is given by a 
_ [2QN +5) 
с, = Ж ү) (12.6) 


For values of N greater than 10, the nermal distribution is ап adequate 
approximation. In consequence, to test whether an observed r is signifi- 
cantly different from zero, we merely divide it by its standard error о, to 
obtain a critical ratio, or normal deviate. We then refer this to a table of the 


186 Statistical Analysis in Psychology and Educalion [Снар, 12 


normal curve. The critical values of this ratio are 1.96 and 2.58 at the 5 and 
1 per cent levels for a two-tailed test. To illustrate, ат = .490 for № = 15 
has a standard error of 


— BOXiST5' 
Мои 1) 492 


The critical ratio is .490/.192 = 2.55. The observed coefficient is signifi- 
cantly different from zero at the 5 per cent level for a two-tailed test and falls 
just short of significance at the 1 per cent level. 

Where N is 10 or less, Table Н of the Appendix may be used. Table Н 
shows probabilities associated with values as large as observed values of 
S in the calculation of r. 


12.8. The Coefficient of Concordance W 


For data comprised of m sets of ranks, where m > 2, a descriptive measure 
of the agreement or concordance between the » sets is provided by Kendall's 
coefficient of concordance W. The data of Table 12.4 are comprised of six 
ranks assigned by four judges. These data were obtained in an investigation 


TABLE 12.4 
Ranks ASSIGNED TO Four Јов APPLICANTS BY Four INTERVIEWERS 


Applicant 
Interviewer 


on interviewing technique. Four interviewers were required to interview 
six job applicants and rank order them on suitability for employment. If 
perfect agreement were observed between the four interviewers, one applicant 
would be assigned a 1 by all four interviewers. The sum of his ranks would 
be4. Another applicant would be assigned a 2 by all four interviewers. The 
sum of his ranks would be 8. The surfs of ranks for the six applicants would 
be 4, 8, 12, 16, 20, and 24, although not necessarily in that order. In general, 
where perfect agreement exists among ranks assigned by m judges to V 
members, the sums of ranks form the series m, 2m, 3m, 4m, . . . , Nm. If 
the four sets of ranks in Table 12.4 were independent of each other, the sums 
of ranks would tend to equality. The sum of ranks assigned by each inter- 
viewer is 21. The total sum for the four interviewers is 21 X 4 = 84. In 


Sec. 12.8] Rank Correlalion Methods 187 


the case of independence, the expected sum of ranks for each applicant is 
АА = 14. In general, the sum of N ranks is N(N + 1)/2. The total sum 
of N ranks for m judges is mN(N + 1)/2, and the expected rank sum for 
each of эл applicants in the case of independence is m(N + 1)/2. 

We observe that the degree of agreement between judges reflects itself in 
the variation in the sums of ranks. Where perfect agreement exists, this 
variation is a maximum. Where all sets of ranks are independent and bear 
а random relation to each other, the variation in rank sums is zero. This 
observation is the basis for the definition of a coefficient of concordance. Let 
R; represent the rank sum of the jth individual. The sum of squares of 
rank sums for № individuals is 


on У (в =. т) (12.7) 


The maximum value of this sum of squares occurs where perfect agreement 
exists between judges and is equal to 


m(N — №) 
Siar x Ry ow Ros 


The coefficient of concordance W is defined as the ratio 


Le Oe. 125 

In the presence of perfect agreement between judges, W = 1. In the case of 
independence, W = 0. W does not take negative values. With more than 
two sets of ranks complete disagreement among judges cannot occur. If A 
and B are in complete disagreement and A and C are also in complete dis- 
agreement, then B and C must be in complete agreement. 

In the example of Table 12.4 the rank totals are 20, 12, 8, 10, 12, and 22. 
The sum of ranks is 84. The mean rank total, the rank sum expected in the 
case of independence, is $* = 14. The sum of squares of deviations about 


this mean is 


S = (20 — 14)? + (12 — 14)? + (8 — 14)? + (10 — 14)? 
+ (12 — 14)? + (22 = 14)? = 160 


In our example m = 4 and N = б and the coefficient of concordance is 


12 X 160 


= Re 7 


Ww 

The concordance among m sets of ranks may be described by calculating 
Spearman rank-order correlation coefficients between all possible pairs of 
ranks and finding the average value, denoted by р. This average is related 


188 Statistical Analysis in Psychology and Education [Cuar. 12 
to W. The relation is given by 


(12.9) 


For the particular case where m — 2 the relation is p — 2W — 1. For 
W =0,p = —1, for W = .5, p = 0, and for W = 1, p = 1. 


12.9. The Coefficient of Concordance with Tied Ranks 


Where tied ranks occur, proceed as before and assign to each member the 
average rank which the tied observations occupy. If the ties are not 
numerous, we may compute W directly from the data without further 
adjustment. If the ties are numerous, a correction factor is calculated for 
each set of ranks. This correction factor is 


pata) (12.10) 


For example, if the ranks on X are 1, 2.5, 2.5, 4, 5, 6, 8, 8, 8, 10, we have two 
groups of ties, one of two ranks and one of three ranks. The correction 
factor for this set of ranks for X is 


ж 9 а 
ra TE SS 
12 
A correction factor T is calculated for each of the m sets of ranks, and these 


are added together over the m sets to obtain ET. We then apply a formula 
for W in which this correction factor is incorporated. The formula is 


(12.11) 


The application of this correction tends to increase the size of W. The 
correction has a small effect unless ties are quite numerous. 


12.10. Significance of the Coefficient of Concordance W 


For N of 7 or less, values of W required for significance at the 5 and 1 per 
cent levels have been tabulated by Friedman (1940) and are reproduced in 
Kendall (1955) and Siegel (1956). A useful adaptation of these tables is 
given by Edwards (1954). Critical values of W depend both on т, the 
number of sets of ranks, and on №, the number of ranks in each set. For 
N greater than 7, a x? test may be applied. Calculate the quantity 


x! = m(N — )W (12.12) 


Sec. 12.11] Rank Correlation Methods 189 


This has a chi-square distribution with № — 1 degrees of freedom. For the 
data of Table 12.4, 5 = 160, W = .571, m = 4, and N = 6. Reference to 
Edwards’s table provides critical values of .505 and .621 for significance at 
the 5 and 1 per cent levels. If we apply the chi-square test to the same data, 
we obtain 

x? = 4(6 — 1) .571 = 11.42 


For df = 6 — 1 = 5 the values of x? required for significance are 11.07 and 
15.09 at the 5 and 1 per cent levels, and as before we are led to the conclusion 
of significant association at the 5 per cent level. Of course, in this case the 
tabled values are to be preferred because JV is less than 7. For N less than 7 
the chi-square test will provide a very rough estimate of the required proba- 
bilities. Other procedures for testing the significance of W exist. For a 
more thorough discussion of this problem see Edwards (1954). 


12.11. The Coefficient of Consistence K 


To obtain a ranking of objects on an attribute, the objects may be pre- 
sented two at a time in all possible pairs and a judge required to make a 
choice on the presentation of each pair. Thus a choice is made between 
every object and every other object. This procedure is known as the 
method of paired comparisons and has been widely used in psychological work. 
The method is usually assumed to yield a more reliable ordering than that 
obtained by requiring a judge to order a whole group of objects directly. 
The number of possible pairs is the number of combinations of V things taken 
two at a time, or N(N — 1)/2. As N increases, the number of comparisons 
increases very rapidly; consequently for large N the method is frequently 
impractical. 

In the method of paired comparisons we may wish to ascertain the con- 
sistency of the choices made. Let А, B, and C be three objects. If A is 
preferred to B and B is preferred to C, consistency of judgment would require 
that A be preferred to С. If C is preferred to A, this latter choice is clearly 
inconsistent with the two previous choices. What meaning attaches to the 
presence of inconsistent choices? Let A, B, and C be red, blue, and yellow 
cards, each of a different saturation. A judge may prefer red to blue, blue to 
yellow, and then may indicate a preference of yellow to red. This incon- 
sistent choice may result because the judge may be unable to discriminate 
and may indicate preferences in a more or less haphazard fashion. Many 
inconsistent choices in the method of paired comparisons result because the 
task requires a refinement of discrimination which is beyond the capacity of 
the judge. Inconsistent responses may also arise because the dimension 
of judgment has changed. The red card may be preferred to the blue and 
the blue to the yellow, on the basis of hue. The yellow may be preferred to 


190 Statistical Analysis in Psychology and Education (СнаР. 12 


the red on the basis of saturation. A different dimension is used as a basis of 
choice and leads to the presence of an inconsistency. To illustrate further, 
an orange may be preferred to a peach because of its color, and a peach may 
be preferred to a pear because of its flavor, a pear may be preferred to an 
orange because of its shape, and an inconsistency arises. Where incon- 
sistencies are numerous, a question may attach to the meaning of the rank 
ordering of objects obtained. It is convenient to represent a choice A in 
preference to B by the notation 4 — B and a choice of B to 4 by B — A. 
The sequence А — B — C — А is an inconsistent triplet, or triad, of choices. 
For any set of paired comparisons between № objects the number of incon- 
sistent triads may be counted and used to define a coefficient of consistency 
of response. 

Responses obtained by the method of paired comparisons may be repre- 
sented in tabular fashion in the form of a response pattern as shown in 
Table 12.5. This table shows paired comparisons between nine objects, 


TABLE 12.5 
RESPONSE PATTERN FOR PAIRED COMPARISONS BETWEEN NINE OBJECTS AND 
CALCULATION OF COEFFICIENT OF CONSISTENCE 


A 

в 

с 

D 

E 

F|lo|olo 0 = 1 1 3 Т 
[о |1 0 о [о "TENET xr 1 m 
віз [о [о 0 о [о са КЕЗД р 1 9 
И УД Л [о [о 1 Vinca 2 4 


R=4 (Е – 8)? = 30 

i= nie Re =P = 3.33 

K = st түсү = 3.33 Xi. = 500 
A, B, C ..., Н, I. A is preferred to B, and a 1 is entered in the cell 
corresponding to row А and col. B above the main diagonal. A compli- 


Sec. 12.11] Rank Correlation Methods 191 


mentary zero is entered in col. А and row В below the main diagonal. All 
other choices may be similarly represented. We note that where no response 
inconsistencies are present, all entries on one side of the main diagonal are 
1’s and all entries on the other side 0’s. In Table 12.5 the presence of some 
0’s above the main diagonal and the complimentary 1’s below it indicate the 
presence of inconsistencies. Let us now sum the rows of Table 12.5. If no 
inconsistencies were present, the row sums would be the numbers 8, 7, 6, 5, 4, 
3, 2, 1, 0. Because of the presence of inconsistencies the actual obtained 
numbers are 7, 6, 5, 5, 4, 3, 3, 2, 1, although not in that order. The effect 
of inconsistencies is to reduce the variability of the numbers obtained by 
adding up the rows of the response pattern. Denote a row sum by R. 
The mean of the row sums is Ё = EZR/N, which may be shown equal to 
(N — 1)/2. The variance of the row sums is 
. 2(R— RP)? zm. (N—1* 
Retin EN FO ТҮРДҮ К! E 
It is appropriate to inquire about the maximum and minimum values of the 
variance sp’. The maximum value of sg? occurs where no inconsistencies are 
present in the response pattern and is equal to (N? — 1)/12. The minimum 
value of sp? depends on whether N is odd or even. If N is odd, the minimum 
value of sp? is zero. All the row sums are the same and are equal to (N — 1)/2. 
This is the expected value of R when all choices are made at random. If № 
is even, it may be shown that the minimum value of sz? is not zero but is 4 
(Kendall, 1943). We then define a coefficient of consistence of response Kas 
the ratio of the observed value of sz? to the maximum range of sp’, which is 
the difference between the maximum and minimum values. Thus 


(12.13) 


= 


= variance of row sums (12.14) 
range of variance of row sums $ 


Simple substitution shows that if N is odd, 
12 
RC as i RI 
К = sn у —1 (12.15) 


and if N is even, 


Ко=зк? уз (12.16) 


'This is Kendall's coefficient of consistence. It has an expected value of 0 
where responses are assigned at random, the case of maximal inconsistency, 
and 1 where no inconsistencies are present. 

The calculation of K is observed to be remarkably simple. Calculate the 
variance of the row sums, se’, of the response pattern and multiply this by 
the factor 12/(N? — 1) if N is odd and by 12/(N* — 4) if N is even. For the 
data of Table 12.5, sz? = 3.33. In this example № is odd and К = .500. 


192 Statistical Analysis in Psychology and Education | Cua». 12 


How may the coefficient K be interpreted? The number of inconsistent 
triads of the kind А — B — C — A may be denoted by d, which is related to 
sr? by the expression 
(N? — N)/12 — Nsr? 

2 


In the example of Table 12.5, d, the number of inconsistencies, is found to be 
15. The maximum possible number occurs where sg? = 0 and is 30. Thus 
half the triadic relations of the kind А — B — C — А are inconsistent, the 
other half consistent, and К = .50. А К of .75 would be interpreted to 
mean that one-quarter of the relations were inconsistent and three-quarters 
consistent. А К of .20 would mean that four-fifths of the relations were 
inconsistent and one-fifth consistent. 

While the coefficient of consistence is obviously of limited application and 
has not as yet been widely used by psychologists, it provides an excellent 
illustration of the nature of the logical processes involved in the definition of 
descriptive statistical measures. 


d= (12.17) 


12.12. The Significance of the Coefficient of Consistence 


The significance of the coefficient of consistence may be approached by 
considering the distribution of the number of triadic relations where choices 
are made at random. Kendall (1955) provides a table of probabilities that 
particular values of d will be attained or exceeded for N = 2 to 7. For 
N > 1, Kendall has shown that a x? test may be used which provides 
approximate probabilities. The quantity 


x? = — t C, —d4- >) + df (12.18) 


has an approximate x? distribution with degrees of freedom given by 


_ N(N — 1)(N — 2) 
df = = DE ES (12.19) 
The term Сз“ in the expression for x? is the number of combinations of N 
things taken three at a time, or N!/3(N — 3)! In using this test the 
required probability that a value of d equal to or greater than that obtained 
will result where choices are alloted at random is the complement of the 
probability for x. 
For the data of Table 12.5, N = 9 and d = 15. We have 


„9х8 7 
ЕХ (9 — 4) 


8 9! 
x = ix - 15 + 3) + 20.16 = 28.9 


= 20.16 


4 ° ` 316! 2 


Sec. 12.12] Rank Correlalion Methods 193 


The probability associated with this value of x? is greater than .99. This 
means that the significance level for d is less than .01, the complement of .99. 
We conclude that the consistency represented in the data is greater than we 
could reasonably expect on the assignment of choices at random. The 
coefficient of consistence K — .50 may be said to be significantly different 
from zero at better than the .01, or 1 per cent, level. 


EXERCISES 
1. The following are paired ranks: 
X 1 „ЖАКУ Р Em 8 
Y vpila eR EO aae TA TOR 227 


Compute Spearman's and Kendall’s rank-order coefficients. Do the coefficients 
obtained differ significantly from zero? 
2. Convert the following measurements to ranks: 


X 4 4 7 7 7 9 16 15310721 25 
Y 8 16 8 8 16 20 12 15 25 20 


Compute Spearman’s and Kendall's rank-order coefficients. Do the coefficients 
obtained differ significantly from zero? : 
3. Is a value of р = .30 where N = 25 significantly different from zero? 
4. Three judges rank order a group of seven students on an examination as follows: 


Judge Student 
a b с d е / g 
A 1 2 3 4 5 6 7 
B 2 LEE ANE. 1 7 6 
C 5 ETE 27 HS 6 7 


Compute the Spearman rank coefficients between judges and the coefficient of concord- 
ance. Check the calculation using formula (12.9). 

5. A supervisor rank orders six employees А, B, C, D, E, and F on job performance using 
the method of paired comparisons. The data are as follows: А > В, A> C, A— D, 
Е» A,F—^ А,В» С, D> B,B— E, B^ F,C— D,C E,F—C,D—E,D—F, 
E— F. Calculate the coefficient of consistence for these data. How may this 
coefficient be interpreted? 


CHAPTER 13 


OTHER VARIETIES OF CORRELATION 


13.1. Introduction 


We have hitherto considered product-moment correlation for use with 
continuous variables of the interval and ratio type. We have considered 
also rank-order correlation methods for use with ordinal data. Many other 
varieties of correlation have been developed. These have application to 
particular types of problems. In many instances, although not all, these 

are particular cases of the more general product-moment correlation and 
are derived on the basis of particular conditions or assumptions. In this 
chapter we shall discuss the contingency coefficient, the phi coefficient or 
fourfold point correlation, point biserial and biserial correlation, tetrachoric 
correlation, and the correlation ratios. The contingency coefficient is a 
descriptive measure of the association between nominal variables. The phi 
coefficient is applicable to 2 X 2 tables when the dichotomous variables are 
assumed to be discrete. Point biserial and biserial correlation are applicable 
to tables comprised of 2 columns and К rows, R > 2. Point-biserial cor- 
relation assumes that the two-categoried variable is discrete. Biserial cor- 
relation assumes that the two-categoried variable is in fact continuous and 
normally distributed. Tetrachoric correlation is a form of correlation for use 
with 2X 2 tables, which in many instances may be reductions of larger 
tables. It assumes that both underlying variables are normally distributed. 
The correlation ratios are applicable when the regression lines are nonlinear. 


13.2. The Contingency Coefficient 


The contingency coefficient is a nominal statistic. It is a descriptive 
measure of association between nominal variables, Tt may be calculated on 
tables comprised of any number of rows and columns, greater, of course, 
than 1. As a nominal statistic it is independent of the ordering of the rows 
and columns of the contingency table. The arrangement of the rows and 
columns may be changed, and the numerical value of the coefficient remains 
unaltered. The formula for the contingency coefficient is usually stated in 

194 


Sec. 13.2] Other Varielies of Correlation 195 
terms of x. As before [Eq. (11.1)], we define x? as 


where O is the observed and E the expected cell frequencies. The contin- 
gency coefficient is then given by 
t eos 
С= Nx (13.1) 


where XV is the total number of observations. 

In Table 11.7 of Chap. 11, a 3 X 3 contingency table is presented showing 
the relationship between eye and hand laterality. For this table V = 413 
and x?, as calculated in Table 11.8, is 4.02. The contingency coefficient for 


these data is 
102 
C7 443402. "1 


A value of C = .031 indicates almost a complete absence of association 
between eye and hand laterality. 

The minimum value of C is zero. C is zero when the two variables are 
independent. C cannot take negative values. The concepts of positive and 
negative imply direction based on an ordering of categories or classes. For 
a strictly nominal variable, the concept of order is without meaning. In 
many practical instances, where contingency coefficients are used, an order 
is observed in the data. If, for example, left-handedness were associated 
with left-eyedness, this might be considered a positive association. If, 
however, left-handedness were associated with right-eyedness, this might be 
considered a negative association. Some investigators may choose to attach 
a positive or negative sign to a contingency coefficient to indicate direction 
when this has meaning in relation to the data. 

The maximum value of the contingency coefficient depends on the number 
of categories of the variables. For square contingency tables the number of 
rows is equal to the number of columns and the maximum value of C is 
given by 

KA 


where Ё is the number of arrays, either rows or columns. Thus for a 2 X 2 
table the maximum upper limit of C is Vğ} = .707; for a 3 X 3 table the 
maximum value is 4/3 = .816. Maximum values for & = 2 to k = 10 are 
as follows: 


196 Slalislical Analysis in Psychology and Education [Снлр. 13 


Number of 

categories for Maximum 

both variables c 
2 .707 
3 .816 
4 ‚866 
5 ‚894 
6 ‚913 
7 ‚926 
8 .935 
9 .943 
10 .949 


We observe that as the number of categories increases, C approaches 1 аза 
limit. The dependence of C on the number of categories raises difficulties 
of interpretation. It means that different values of C are not directly 
comparable unless based on tables having the same number of rows and 
columns. Thus a contingency coefficient based on a 2 X 2 table may be 
compared directly with one based on another 2 X 2 table. It is not directly 
comparable, however, with one based on a 3 X 3 or 3 X 4 table. 

The sampling distribution of the contingency coefficient is a matter of 
some complexity. To test the significance of an obtained value of C a 
knowledge of its sampling distribution is unnecessary. To compute C we 
require х. We may test the significance of C by consulting a table to ascer- 
tain whether or not the x? is significant. 

In computing C, considerations pertaining to small cell frequencies in 
relation to x?, as described in 11.6, apply. 


13.3. The Phi Coefficient 


The phi coefficient, or fourfold point correlation, is applicable to 2 X 2 
tables only. It is related to x*. The two dichotomous variables are assumed 
to be discrete, and the two categories of each to be amenable to appropriate 
representation by two point values. In practice it is widely used when the 
two variables are obviously not discontinuous. 

One formula for calculating the phi coefficient, or ¢, is 

BC — AD 
°° Va + BC Da + OB F D) Vn 
where А, B, C, and D are the four cell frequencies. The term in the denomi- 
nator of the above expression is the square root of the product of the four 
marginal totals. 

Table 13.1 shows a 2 X 2 table illustrating the relationship between two 

psychological test items. The value of ¢ based on this table is .376. The 


Src. 13.3] Other Varielies of Correlation 197 


reader will note that in this example the two underlying variables may be 
regarded as continuous. The categories “pass” and “fail” may be con- 
sidered a dichotomy of an underlying continuous ability variable. Indi- 
viduals above a certain threshold value on the ability variable pass the item; 
those below it fail the item. 


ТАвгЕ 13.1 
COMPUTATION OF Put COEFFICIENT OF CORRELATION BETWEEN Two TEST ITEMS 
Frequency Proportion 
Item 2 Item 2 
Fail Pass Fail Pass 
Pass Pass 
E^ - 
8 8 
- + 
"* Fail = Fail 


‚52 48 
(9) (ра) 
219 х 15-11х5 = 376 


` 4/30 X 20 X 24 X 26 


The phi coefficient is related to x? calculated on a 2 X 2 table by the 
expression 


= „© 
N 
or xi = үф? (13.4) 


Any formula for calculating х? for a 2 X 2 table may with minor modifica- 
tion be used for calculating ф. 

Alternative formulas for computing $ may be stated. In psychological- 
test statistics it is conventional to represent the proportion passing item i 
by 5; and those failing by g;, where p; = 1— q: Similarly, the proportion 
passing item j is pj and the proportion failing gj. The proportion passing 
both items i and j is represented by pi. The ф coefficient of correlation 
between two test items may then be written as 


bu — bii 
gu te (13.5) 
V bitadi 


For the example of Table 13.1, py = .38 and the phi coefficient is 


.38 — .60 X .48 EP 
Gee =. 76 
/.60 X 52 X 40 X 48 


V 198 Statistical Analysis in Psychology and Education (Снар. 13 


which checks with the result previously obtained. When one of the variables 
is evenly divided, р; = ф = .50, the formula for ¢ simplifies to 


$e I B (13.6) 


When both variables are evenly divided and p; = q; = f; = qj = .50, the 
formula becomes 


¢ = 4py—1 (13.7) 


The phi coefficient has been widely used in statistical work associated with 
psychological tests. Usually when investigators speak of the correlation 
between dichotomously scored test items, the reference is to the phi coefficient. 

The phi coefficient is a particular case of the product-moment correlation 
coefficient. If we assign integers, say, 1 and 0, to represent the two cate- 
gories of each variable and calculate the product-moment correlation coeffi- 
cient in the usual way, the result will be identical with ¢. For example, on 
psychological-test items a 1 may be assigned for a pass and a 0 for a failure. 
On two items we obtain a set of N paired observations, the variables being 
restricted to the values 1 and 0. The mean and standard deviation of the 
two variables may be calculated. The mean of item i is observed to be pi- 
The standard deviation can be shown to be s; = ур. The usual formula 
for a product-moment correlation coefficient isr = У(Х — X)(Y — Y)/Nsis,. 
The term Z(X — X)(Y — P)/N reduces, where the variables can take 
values of only 1 and 0, to pj — р;р, and the correlation becomes 


= bu bb 
Урда} 


Тһе phi coefficient has а minimum value of —1 in the case of perfect 
negative and a maximum value of 4-1 in the case of perfect positive associa- 
tion. These limits, however, can be attained only when the two variables 
are evenly divided; that is, pj = q; = py = q; = .50. When the variables 
are the same shape, p; = фу and ф = qj, but are asymmetrical, р; * q; and 
pj ¥ qj, one or the other of the limits — 1 or +1 may be attained but not both. 
The maximum and minimum values of phi are clearly influenced by the 
marginal totals. Consider the following 2 X 2 tables: 


"= ф 


Sec. 13.4] Other Varielies of Correlation E 


In tables 1 and 2 both variables are evenly divided and coefficients of +1 
and —1 are possible. Table 3 represents the maximum positive association 
possible, given the restriction of the marginal totals. Тһе phi coefficient is 
.613. Table 4 shows the most extreme negative association possible with the 
same marginal totals. The phi coefficient is —.403. For this particular set 
of marginal totals phi can extend from a minimum of — .403 to a maximum 
of .613. 

While the influence of the marginal totals on the range of values of phi 
may in some of its applications prove to be a disadvantage, this effect is in 
no way inconsistent with correlation theory. If a correlation coefficient is 
viewed as a measure of the efficacy of prediction, then perfect prediction in 
both a positive and a negative direction is possible only when the two dis- 
tributions have the same shape and are symmetrical. If one variable is 
normally distributed and the other is rectangular, perfect prediction of the 
one from the other is not possible and the correlation coeflicient reflects this 
fact. Perfect prediction in one direction requires only identity of shape; 
perfect prediction in both directions requires symmetry also. The phi 
coefficient, although affected by the marginal totals, is a measure of the 
efficacy of prediction. From this viewpoint it quite rightly reflects the loss 
in degree of prediction resulting from the lack of concordance of the two 
marginal distributions. 

Because x? = N¢*, we can readily test the significance of ф by referring 
N 9? to a chi-square table with 1 degree of freedom. When df = 1, Visa 
normal deviate and we may refer ¢ VJ to tables of the normal curve. In 
sampling from a population where no association exists, the distribution of ф 
should be approximately normal with a standard error of 1 /N/N. Ofcourse, 
all considerations pertaining to small frequencies (Sec. 11.6) apply here. N 
should, clearly, not be too small. 


13.4. Point Biserial Correlation 


Point biserial correlation provides a measure of relationship between a 
continuous variable and a two-categoried, or dichotomous, variable. The 
data when arranged in a frequency distribution take the form of a table 
comprised of R rows and 2 columns. The dichotomous variable is assumed to 
be discrete. For example, the continuous variable may be scores on à 
psychological test and the dichotomous variable may be male or female, or 
high school graduates and university graduates, or owning a television set 
and not owning a television set. Point biserial correlation is frequently 
applied in practice where the underlying dichotomous variable is not discrete. 
For example, “pass” or “fail” on a psychological-test item may be inter- 
preted to be a dichotomy of an underlying continuous ability variable. 
"Normal" versus “neurotic” may be considered a somewhat arbitrary 


200 Statistical Analysis in Psychology and Education [Снлр. 13 


division of a continuous neuroticism dimension. Success or failure in an 
occupation may be viewed as a dichotomy of a continuous variable extending 
from exalted achievement to abysmal defeat. 

Point biserial correlation is a product-moment correlation and is a particu- 
lar case of the formula r = D(X — X)(Y — Y)/Ns,s, If we assign a 1 to 
individuals in one category and a 0 to individuals in the other and calculate 
the product-moment correlation, the result is a point biserial coefficient. 
Weights other than 1 and 0 may be assigned to the categories. The coeffi- 
cient is in no way dependent on the weights assigned. 

The formula for point biserial r is 


"ы = B 7) (13.8) 


where s, = standard deviation of all scores on continuous variable 
p and q = proportions of individuals in two categories of dichotomous 
variable 
X, and X = mean scores on continuous variable of individuals within the 
two categories 
Thus if the continuous variable is a set of error scores on a maze test designed 
to provide a measure of animal “intelligence,” and the two categories of the 
dichotomous variable are samples of “dull” and “bright” strains of rats, 
then X, is the mean error score on the maze test of the dull and X, is the mean 
error score of the bright rats. In this example a high error score means low 
intelligence. The direction of the correlation must be determined by inspec- 
tion of the data. 

To illustrate the calculation of point biserial correlation from ungrouped 
data consider Table 13.2. This table presents scores on an “anxiety” 
inventory for a group of 14 individuals, 8 of whom are described as “normal 
and 6 as “neurotic.” The higher the score on the inventory, the greater the 
anxiety. The mean inventory score X, for the six neurotics is 38.15, and 
the mean X, for the eight normals is 23.88. A comparison of these means 
suggests that the test discriminates between the two groups. The standard 
deviation of inventory scores is 18.19. The proportions р and д of neurotics 
and normals are .43 and .57. The point biserial correlation is .39. 

In this example the point biserial correlation coefficient is a measure of 
the capacity of the anxiety inventory to discriminate between the two clinical 
groups. This statistic can always be interpreted as a measure of the degree 
to which the continuous variable differentiates, or discriminates, between the 
two categories of the dichotomous variable. The reader will note in Table 
13.2 that if the eight individuals making the lowest inventory scores were 
normals and the six making the highest scores were neurotics, the point 
biserial would be a maximum for these data. Also, if the labels “normal” 


Sec. 13.4] = Other Varielies of Correlation 201 


TABLE 13.2 
CALCULATION OF PorNT BISERIAL CORRELATION FROM UNGROUPED DATA 


Individual Inventory Clinical 
score description 
1 6 Normal 
2 8 Neurotic 
3 8 Normal 
4 11 Normal 
5 16 Neurotic 
6 25 Normal 
7 27 Normal 
8 31 Normal 
9 31 Neurotic 
10 39 Normal 
11 44 Normal 
12 50 Neurotic 
13 56 Neurotic 
14 68 | Neurotic 


Mean score for neurotics: X, = 38.15 
Mean score for normals: X, — 23.88 


= 1819 ديردم‎ 43 дет = .57 


38.15 — 23.88 
пы = Sg VAX 37 = .39 


and “neurotic” were arranged more or less at random in relation to score, 
the difference between X, and X,, and also rpi, would tend to zero. 
An alternative method of calculating point biserial correlation is given by 


rum 5 € (13.9) 


where X, is the mean of all scores on the continuous variable. This formula 
requires less computation than the previous one where the data are grouped 
in the form of a frequency distribution. Table 13.3 illustrates the use of this 
formula in calculating a point biserial correlation between a test item scored 
on a pass or fail basis and total scores on а psychological test. 

Point biserial correlation is not independent of the proportions in the two 
categories. When р q = .50, its maximum and minimum values will 
differ from those when, say, p = -20 and д = .80. The maximum value of 
гуы never reaches +1; the minimum value never reaches —1. In predicting 
& two-categoried variable from a continuous variable, perfect prediction is 
possible and occurs when the two frequency distributions do not overlap. 
Perfect prediction of a continuous variable from a two-categoried variable is 


202 Stalistical Analysis in Psychology and Educalion [Снлр. 13 


Taste 13.3 
CALCULATION OF POINT BISERIAL CORRELATION BETWEEN A TEST ITEM AND 
Torar Test Scores 


Test item Calculation of X, | Calculation of s, 


"Test score Pass Total 


f» 


Ж, = Xo + hE = 54.5 + 10x 103 — 44.20 

Rp = Xo + E = 545 + 10x 5 = 55.43 

а= B _ ME) -1 55 — Cue) = 119 
пы = 843 = 44.20 î 


obviously impossible. Some error in prediction must always occur in predict- 
ing a variable which may take a wide range of values from a variable which 
тау take two values only. The point biserial correlation coefficient reflects 
this fact. It is worth noting here that the regression line obtained by 
calculating the means of the two columns is of necessity linear, there being 
only two points. The regression line obtained by calculating the means of 
the rows cannot be linear except under certain special circumstances. 

To test the significance of rj; from zero the situation may be treated as 
one requiring a comparison of the two means X, and X, a The appropriate 


value of ¢ may be written 
N—2 m 
EA n2 ie . 0310 


"ee 
The number of degrees of freedom is № — 2. This is a two-tailed test. For 


large N the quantity 1/+/N may be used as the standard error of Уры in 
testing the significance of the difference from zero. 


n 


3 


Sec. 13.5] * Other Varieties of Correlation 203 


13.5. Biserial Correlation 


Biserial correlation is a measure of the relationship between a continuous 
and a dichotomous variable, it being assumed that the variable underlying 
the dichotomy is continuous and normal. If a bivariate table comprised of 
R rows and C columns is dichotomized and reduced to a table of R rows and 2 
columns, biserial correlation will be a more accurate estimate of the correla- 
tion based on the R X C table than point biserial correlation. One of its 
applications is in the selection of items for psychological tests. The biserial 
correlation of an item with total test score is frequently used as a measure of 
the discriminatory power of the item. 

The formula for calculating this coefficient is 


£, S Х, pq 
St y 


(13.11) 


т = 


where X, and X, = mean scores on continuous variable of individuals in 
two categories 
p and ф = proportions in two categories 
5, = standard deviation of all scores 
у = height of ordinate of unit normal curve at point of 
division between p and д proportions of cases 
Thus if û = .30 and д = .70, by consulting the table of areas and ordinates 
of the normal curve, Table A of the Appendix, we can ascertain that the 
height of the ordinate y at the point of dichotomy is .348. 

For the data of Table 13.2 we may, for illustrative purposes, assume that 
the normal-versus-neurotic dichotomy is a division of a normally distributed 
continuous variable. This assumption may or may not be warranted in fact. 
For these data p = .43 and q = .57. The height of the ordinate of the unit 
normal curve at the point of dichotomy is y — .393, £, = 38.15, £, = 23:99, 
5, — 18.19, and 

38.15 — 23.88 ر‎ .43 X .57 _ 


Mes ccc etel c 


An alternative formula for biserial correlation is 


X,— Xi 
њ = сўг (13.12) 


where Хз ветот score for the total sample. Applying this formula to 
the data of Table 13.3, we obtain 7; = .834. 


Theoretically, the maximum and minimum values of r»; are independent 
of the point of dichotomy and are —1 and +1. An implicit assumption 
underlying this statistic is that the continuous many-valued variable is 


Po 4 


204 Statistical Analysis in Psychology and Education | CA». 13 


normal, as well as the variable underlying the dichotomy. Values of rs; 
greater than unity can occur under gross departures from normality. 

Some difficulties surround the sampling distribution of rw. The standard 
error of ғы in sampling from a population where the correlation is zero is 


roughly 
ne ; B (13.13) 


When N is large this formula may be used with reference to the normal curve 
to test the significance of r». It should, however, be used with caution, 
because the probabilities thereby obtained are somewhat inaccurate. The 
standard error tends to increase with the extremeness of the dichotomies. 
The reader may wish to compare the standard error of r»; with the correspond- 
ing large-sample formula for the ordinary product-moment correlation 
з = 1/\/N. The standard error of гы is always larger than the standard 
error of the ordinary product-moment correlation. Where р = д = .5, the 
standard error of гь; is 1.25 times as large as the standard error of r. Where 
p = .90 and д = .10, the standard error of r»; is 1.71 times that of r. For 
further discussion of the sampling distribution of r», see Walker and Lev 
(1953). 

The relation between biserial and point biserial correlation is given by the 
expression 


ты = „Уй (13.14) 


The factor 4//?g/y varies from 1.25 where р = q = .5 to 3.73 where p = .99 
and д = 01. Thus r, is always greater than jj; and the difference increases 
with extremeness of the dichotomies. 


13.6. Tetrachoric Correlation 


Tetrachoric correlation is appropriate to data arranged in a 2 X 2, or 
fourfold, table. It assumes that both variables underlying the dichotomies 
are normally distributed. It has been used to provide a convenient measure 
of correlation when graduated measurements have been reduced to two 
categories. It is an estimate of product-moment correlation. The tetra- 
choric correlation calculated on a 2 X 2 table should be roughly about the 
same as that calculated on the more highly graduated R X C table, when 
the two variables are approximately normal in form. 

Direct calculation of tetrachoric correlation coefficients is algebraically 
complex and arithmetically laborious. Because of this, various approximate 
methods and computation procedures have been devised. A commonly 


Sec. 13.6] Other Varielies of Correlation 205 


used approximation is known as the cosine-pi formula, which may be written 
in the form 
180* 
f, C08 | ——— 13.15 
t (; + VRCAD) ныз) 


A, B, C, and D are the four cell frequencies. B and C are the high-high and 
low-low and A and D the high-low and low-high cell frequencies. The reader 
will recall that the cosine of an angle is the horizontal side of a right-angle 
triangle divided by the hypotenuse, the side opposite the right angle. The 
quantity in the parentheses of the above formula is an angle, and its cosine 
is an estimate of the tetrachoric correlation. When АЛ = 0, the case of 
perfect positive correlation, the quantity ~/BC/AD is indefinitely large and 
т = cos 0°. A table of trigonometric functions shows that the cosine of a 
zero angle is +1.00. When BC = 0, the case of perfect negative correlation, 
the quantity 4/BC/AD = 0 and rı = cos 180°. The cosine of a 180° angle 
is — 1.00. When BC = AD, the case of independence, 4//BC/AD = 1 and 
7, = cos 90°. The cosine of a 90° angle is zero. If the angle is greater than 
90°, the correlation is negative. 


TABLE 13.4 
CALCULATION OF TETRACHORIC CORRELATION USING Cosine-P1 APPROXIMATION 
Occupation rating 
Below Above 
average average 


Above median 76 
(4) (B) 


Test 


Below median 55 


= cos 73.08° 


= 291 


Table 13.4 illustrates the application of formula 13.15. The amount of 
calculation is trivial. Tables have been prepared which enable the rapid 
determining of the cosine-pi approximation of r, from the ratio BC/AD, 
this being the only calculation required. Such tables are reproduced in 
Guilford (1956) and Edwards (1954). 

The cosine-pi formula provides an excellent approximation to the tetra- 
choric correlation when the divisions of the two variables are equal, û = q = n 
As the divisions depart from equality this formula tends to overestimate the 


206 Statistical Analysis in Psychology and Education [(Снлр. 13 


tetrachoric correlation. When the limits of the divisions are between .4 
and .6, the discrepancy in estimation is quite small, its maximum value 
being about .02. For extreme divisions the discrepancy is substantial. For 
a method of estimating tetrachoric r where the divisions of the variables are 
not close to the medians, the reader is referred to tables prepared by Jenkins 
(1955). See also note by Fishman (1956). 

A formula for the standard error of a tetrachoric correlation in sampling 
from a population where the population value is zero is given by 


=l _ bbg 
Sn s ү (13.16) 


where y; and y; are the heights of the ordinates of the unit normal curve 
at the points of dichotomy, and pı, qı and fs, qs are the proportions in the 
two categories for the two variables. While this formula may be used with 
reference to the unit normal curve to test the significance of an observed 7, 
the procedure is somewhat dubious because uncertainty attaches to the 
nature of the sampling distribution of r, То test the significance of a 
correlation on a 2 X 2 table the investigator is on much safer ground using 
x? rather than concerning himself with use of the standard error of rẹ. This 
formula, however, permits a comparison with the corresponding large-sample 
standard-error formula for product-moment r, s, = 1/4/N. The standard 
error of r, is always greater than the standard error of r. The magnitude of 
the error increases with increase in the extremeness of the dichotomies. 
When the population value of r, is zero and the two variables are evenly 
split, s,, is about 1.57 times as large as з. When p = .90 and д = .10 for 
both variables, s, is about 2.92 times as large as s» These considerations 
suggest that the use of tetrachoric correlation is ill-advised when the dichoto- 
mies are extreme. 

Tetrachoric correlation has been used as a laborsaving device when large 
numbers of correlations are required. This procedure is quite acceptable 
when N is large. Also, under these conditions it is possible to dichotomize 
close to the medians of the two variables. In any situation, however, the 
reduction of a multicategoried to a two-categoried variable results in a loss 
of information, which in the case of r, reflects itself in the standard error. 


13.7. The Correlation Ratios 


The correlation ratios are descriptive of the relationship between variables 
when the regression lines are nonlinear. Although of some theoretical 
interest, they have been infrequently used by psychologists. А common 
example of a nonlinear relationship occurs in the correlation of psychological- 
test performance and chronological age when a broad age range is covered. 
Performance usually shows a more rapid increase during the earlier than the 


Sec. 13.7] Other Varielies of Correlation 207 


later years. The discussion of the correlation ratios given here is rather 
cursory. 

The reader will recall from Chap. 8 that the calculation of a product- 
moment correlation involves in effect the calculation of two regression lines. 
One line is used in predicting Y from X, and the other X from Y. A dis- 
crepancy between an observed value and a predicted value, a point on a 
regression line, is an error of estimate. If Y is an observed value and Y' is an 
estimate of it predicted from X, then the difference Y — Y’ is an error of 
estimate. The variance of these errors is an inverse measure of the efficacy 
of predicting one variable from a knowledge of another. In predicting V 
from X this variance is 5? = Z(Y — Y’)*/N. In predicting X from Y it 
is Say? = D(X — X)?/N. With product-moment correlation these two 
measures are equal. The correlation coefficient is related to these measures 
of error of estimate by 

EG Say" 
7? = 1 — FE =1— rus 


Consider now the following highly artificial bivariate frequency table: 


5 

Y 4 20 
3 10 
2 2 


1 5 10 10 5 1 


If two regression lines are fitted to the means of rows and columns in this 
table, these lines will be at right angles to each other and the product- 
moment correlation will be zero. Clearly, from a prediction viewpoint, this 
correlation does not adequately describe the situation. In predicting V 
from X, perfect prediction is possible. If X is 1, then Y is 2; if X is 2, then 
Y is 3; and so on. In predicting X from Y, however, prediction is far from 
perfect. If Y is 2, then X may be either 1 or 6; if У is 3, then X may be 
either 2 or 5; if Y is 4, then X may be either 3 or 4. Thus the prediction of 
Y from X is perfect, whereas the prediction of X from Y is subject to gross 
error. This results from nonlinearity of regression, a circumstance not 
unrelated to the shapes of the two marginal distributions. 


208 Statistical Analysis in Psychology and Educalion [Снхлр. 13 


In situations of this kind the correlation ratios may be used to describe the 
relationships between the variables. With product-moment correlation an 
error of estimate is a deviation from a straight regression line fitted to the 
means of rows or columns. With the correlation ratios an error of estimate 
is simply a deviation from the mean of a row or column. No regression 
lines are used. If F; is the mean of column j, and Y; is the score of the 
ith individual in column j, then the difference Y;; — F; is an error of estimate. 
'The variance of the errors of estimate in predicting V from X may be written 


2 Fa = Po 
E E 


сз N (13.17) 


This is simply the average of the squared deviations from the means of the 
columns. The corresponding variance taken about the means of the rows is 
Saz. Тһе correlation ratio which is descriptive of the prediction of У from 
X is defined as 


Sage 
nouga (13.18) 
Sy 
and in predicting X from Y we have 
2 Заг? 
tn чы FE (13.19) 


In the case of perfect linearity of regression, a circumstance which does 
not arise in practice because of sampling error, т? = Ney? = r. Where 
the regression lines are nonlinear, the two correlation ratios will differ from 
each other and from the correlation coefficient. The correlation ratio 
in general is equal to or greater than the correlation coefficient. Thus 
1 > mè > r. 

The discrepancy between 7,,* and r? is used as a measure of nonlinearity 
of regression. The greater the difference, "ys — r°, the greater the depar- 
tures from linearity. To test the significance of the departures of a regression 
line from linearity, we calculate the quantity 

= (C - r/( — 2) 

F P/N =D (13.20) 

where Ё = number of arrays, either rows or columns 
N = total number of cases 

This ratio is an F ratio and may be referred to a table of F with k — 2 degrees 
of freedom associated with the numerator and N — & degrees of freedom asso- 
ciated with the denominator. Note that two such tests may be applied to 
any correlation table, one a test of the linearity of regression of X on Y and 
the other of Y on X. 


a. CCS 


Sec. 13.7] Other Varielies of Correlation 209 


To test whether a correlation ratio is significantly different from zero we 
may use the Ё ratio: 
pz om — 0 
(1 — 29/(N — k) 


This Ё ratio has k — 1 degrees of freedom associated with its numerator and 
N — k degrees of freedom associated with its denominator. Quite obviously, 
one correlation ratio may differ significantly from zero and the other may not. 

Procedures for the practical computation of the correlation ratios are given 
in most statistics texts (see, for example, Guilford, 1956). 


(13.21) 


EXERCISES 


1. What type of correlation coefficient is appropriate to describe the relation between 
psychological-test scores and (a) sex, (b) age, (с) a pass-fail criterion? 
2. The following are data on the correlation between responses to two test items: 


Item 2 
Fail Pass 


Pass | 40 30 70 


Item 1 


Compute the phi coefficient. à 
3. Compute for the data of Exercise 2 above the maximum and minimum values of phi. 
4. The following are data on the correlation between test scores and responses on a test 


item: 
Item 
Class 
interval Fail | Pass 
30-34 1 


Compute both the point biserial and the biserial correlation coefficients for these data. 
5. Dichotomize the test scores in Exercise 4 above to obtain a 2 X 2 table and calculate 
a tetrachoric correlation coefficient using the cosine-pi formula. 
6. Derive formula (13.5) from the basic product-moment correlation formula r = Zxy/ 
Ns,s,. 


CHAPTER 14 


TRANSFORMATIONS: THEIR NATURE 
AND PURPOSE 


14.1. Introduction 


Many varieties of transformations are used in the interpretation and 
analysis of statistical data. A transformation is any systematic alteration 
in a set of observations whereby certain characteristics of the set are changed 
and other characteristics remain unchanged. The representation of a set 
of observations X as deviations from the mean X — X — x is a simple 
transformation. The mean of the transformed value is zero. All other 
characteristics of the transformed values are the same as those of the original 
values. The variability, skewness, and kurtosis remain unchanged. The ' 
ordinal properties of the data are preserved. The rank ordering of ће, 
observations is the same as before. The transformation of a variable X to 
standard-score form (X — X)/s = в results in a change both in mean and 
standard deviation. The mean of the transformed values is zero, and the 
standard deviation is unity. Skewness, kurtosis, and rank order are 
unchanged. . 

Certain commonly used transformations change the shape of the frequency 
distribution of the variable. The variable may, for example, be transformed 
to the normal form. This may involve not only a change in mean and 
standard deviation, but also a change in skewness and kurtosis. The original 
observations may be negatively skewed and leptokurtic. The transformed 
values may be normally distributed, or approximately so. This type of 
transformation does not change the rank order of the observations. The 
transformations most commonly used by psychologists that alter the shape 
of the frequency distribution are to the normal and rectangular forms. The 
conversion of a set of observations to percentile ranks is a transformation 
to a rectangular distribution. 

The conversion of a set of frequencies fi, fs, fa, . . . , fa to proportions by 
dividing each frequency by N, or to percentages by dividing by N and multiply- 
ing by 100, is a simple transformation. The ordering of the transformed 
values is the same as the ordering of the original frequencies. If each fre- 
quency is divided by different values of N, say Ny, Ns, Ny, ... , №, then 

210 


Sec. 14.1] Transformations: Their Nature and Purpose 211 


the transformed values will quite probably have an order different from the 
original values. The conversion of a mental age to an intelligence quotient 
by dividing by chronological age and multiplying by 100 is a transformation 
which changes the ordinal properties of the data. In converting mental ages 
to intelligence quotients, not only is the order changed, but also the mean, 
standard deviation, skewness, and kurtosis. The transformed values have 
a mean of 100 in the standardization group and are approximately normally 
distributed with a known standard deviation. 

Transformations are used for a variety of reasons. The use of transformed 
values may assist understanding and algebraic manipulation. The correla- 
tion coefficient, for example, may be writtenasr = E(X — X)(Y — Y)/Ns;s,. 
Transformed to standard measure it becomes r = Zz,z,/N. The correlation 
coefficient is observed to be a function of the standard scores. It is their 
average product. This means in effect that the original values X and У 
may be transformed by the addition of constants, thus changing the means 
X and Ў, and by multiplying by constants, thus changing the standard 
deviations s, and s,, and the correlation coefficient remains unchanged. 
The correlation coefficient may be said to be independent of, or invariant 
under, transformations which involve adding or multiplying the variate 
values by constant factors. In ordinary algebraic work it is usually easier 
to manipulate standard scores than the original observations. In com- 
putation considerable use is made of transformed values. For example, in 
calculating a mean from grouped data a computation variable may be used 
which is a deviation from an arbitrary origin in units of class interval. 
The mean of this variable is calculated, and a simple formula applied to 
convert this mean back to the mean of the original observations. The 
purpose of this is to save arithmetic. 

In forms of correlational analysis, involving а number of variables, the 
distributions of the variables may assume a variety of shapes. Some may be 
negatively and others positively skewed. Some may be platykurtic, and 
others leptokurtic. If correlation coefficients are computed, these соећ- 
cients will not be altogether independent of the differences in the shapes of 
the distributions. To achieve comparability it is a common practice to 
transform all variables to an approximately normal form and compute the 
correlations on transformed values. Such transformations may also have 
the related effect of improving the fit of linear-regression lines to the data. 

Raw scores on psychological tests are usually highly arbitrary. The 
values of the mean, standard deviation, and possible range of scores reside 
in large measure in the predilection of the test constructor. Unless the mean, 
standard deviation, and something about the shape of the score distribution 
are known, no proper interpretation can be attached to the original, or raw, 
scores, Such scores are frequently transformed to normal distributions with 
an agreed mean and standard deviation. For example, a psychological test 


212 Slalislical Analysis in Psychology and Educalion — [СнАр. 14 


when administered to a representative sample of individuals from the popula- 
tion for which the test is intended may have a mean of 37 and a standard 
deviation of 9.6 and be positively skewed. Scores may be transformed to a 
normally distributed variable with a mean of 100 and a standard deviation 
of 16. Scores thus transformed to the normal form immediately take on 
meaning. If an individual has a score of 116, we know that he is one standard 
deviation unit above the average. Because the scores are normally dis- 
tributed we know that his performance is better than that of about 84 per 
cent of the population and below the performance of about 16 per cent of the - 
population. The procedure for developing such a transformation is known 
as standardization. A psychological test is said to be standardized when 
transformed scores are available, based on a reference group of acceptable 
size. The transformed scores themselves are called norms. An individual's 
score takes on meaning in relation to a standard, or normative, group. 
Tests are frequently standardized to permit age allowances. This means in 
effect that separate norms have been prepared for each age group. The 
average child in each age group may have a mean transformed score of, say, 
100. The standard deviation of scores for each age group may be 16. Thus 
а younger child may make a lower raw score than an older child but have а 
considerably higher transformed score. Intelligence quotients are trans- 
formed scores which make adjustments for the differing chronological ages 
of children taking the test. Intelligence quotients are presumed to be 
independent of chronological age within an accepted age range. Most 
published tests are accompanied by manuals containing conversion tables 
which permit the transformation of raw scores to standardized scores. Both 
normal and rectangular transformations are used in test standardization. 

The application of a ¢ test for the significance of the difference between 
two means assumes normality and equality of variance of the population 
distributions. The same assumptions underlie the use of the analysis of 
variance. In practice, data are often encountered which depart appreciably 
from the normal form and with unequal variances. Here the investigator 
has several avenues open to him. If the departures from normality and 
equality of variance are not too gross, he may apply the usual procedures, 
knowing that the data do not satisfy the assumptions required, and impose 
upon himself a more rigorous level of significance. Fairly marked departures 
from normality may occur, and the tests of significance will not be too 
seriously affected. Where the departures from normality and equality of 
variance are gross, a transformation is sometimes used. Square-root and 
logarithmic transformations are appropriate to certain classes of data. A 
square-root transformation converts X to +/X; a logarithmic transformation 
converts X to log X. Under the special circumstances where they are 
appropriate, these transformations may achieve approximate normality 
and equality of variance. 


Sec. 14.2] Transformations: Their Nature and Purpose 213 


An example of the practical utility of a transformation is Fisher's z, trans- 
formation used in tests of significance of correlation coefficients, described 
in Chap. 10. The variance and shape of the sampling distribution of the 
correlation coefficient varies as a function of the population value p. The 
transformed values z, are approximately normally distributed and nearly 
independent of p with a standard deviation close to 1/ МУ - 3. 

Tests of significance may be applied which are independent of the shapes 
of the population distributions. These tests are known as distribution-free, 
or nonparametric, tests (Chap. 17). Such tests in effect transform the 
original measurements to ranks or signs. А rank transformation simply 
converts measurements to the integers 1, 2, 3, . . . , №. Subsequent 
calculation and interpretation are based on these integers. A sign trans- 
formation converts the measurements to plus and minus signs. Observations 
above a median value may be assigned a plus, and those below a minus. The 
reduction of data to their rank and sign properties leads to a loss of infor- 
mation. More observations are required to achieve significance at an 
accepted level of significance. 

The reader will observe that a persisting theme underlying many trans- 
formations is independence, or invariance. By transforming the original 
observations to standard measure, or the equivalent, meaningful comparisons 
between variables may be made which are independent of the means and 
standard deviations. By transforming the original observations to a model 
distribution, perhaps normal or rectangular, comparisons may be made which 
are independent of the idiosyncratic shapes of the original distributions. 
The transformation of original measurements to intelligence quotients 
results in a variable which is roughly independent of chronological age. 
Meaningful comparisons may thereby be made between children of different 
ages. The 3, values obtained by Fisher's transformation are independent of 
the population parameter p. The reduction of data in nonparametric 
statistics to ranks and signs leads, at some cost, to tests of significance 
which are independent of the shapes of the population distributions. Clearly, 
the essence of the idea of a transformation is the attainment of a variable 
which is independent of, or invariant with respect to, certain other variable 
properties for the purpose of achieving desired and meaningful comparisons. 


14.2. Transformations to Standard Measure 


A standard score is a deviation from the mean divided by the standard 
deviation; thus z = (X — X)/s. The mean is the origin, and the standard 
deviation is the unit of measurement. Thus a particular value is з standard 
deviation units above or below the mean. The mean of z scores is zero, and 
the standard deviation is unity. The skewness and kurtosis of the dis- 
tribution are unchanged. The distribution of z scores has the same shape as 


214 Slalislical Analysis in Psychology and Educalion | | Cnr. 14 


the distribution of X. Standard scores on two or more variables are directly 
comparable only in the sense that they have the same mean and standard 
deviation. 

A standard-score transformation does not change the proportionality of 
scale intervals. If X;, X», and X; are three measurements in raw-score 
form and 21, zs, and z; are the same three measurements in standard-score 
form, then 

X T X: OE k) 

X | mms X 3 ы 2 — 23 
This means that the relative distances between the variate values remain 
unchanged under a standard-score transformation. Let X;, X», and X; be 


20, 30, and 50. If X = 40 and s = 15, then 21, zs, and z; become — 1.33, 
—.67, and .67. We note that 


20—30  —133-.67 _ "i 
30—50 —.67— .67 ` 


Standard scores involve the use of decimals and plus and minus signs. 
This is sometimes inconvenient. Also the range of values will seldom exceed 
the limits —3 and +3. It is not uncommon to select an arbitrary origin 
and standard deviation to ensure that all, or nearly all, the measures have a 
plus sign and that decimals are eliminated. For this purpose a mean of 50 
and a standard deviation of 15 are sometimes used. If 2’ denotes this type 


of score, then 
s - 504 is( = *) 


= 50 + 152 


To change the standard deviation we multiply every standard score by 15. 
To change the origin we merely add 50. Values of z’ are rounded to the 
nearest integer. In comparing performance on a series of tests standard- 


score values g’ are more convenient than s. Of course, any other mean and 
standard deviation could be selected. 


14.3. Percentile Points and Percentile Ranks 


In the standardization of psychological tests transformations to percentile 
ranks have frequent application. Such transformations are rectangular. 
Each percentile rank has the same frequency of occurrence. The frequency 
distribution is flat. 

A clear distinction must be made between percentile points and percentile 
ranks. If k per cent of the members of a sample have scores less than a 
particular value, that value is the kth percentile point. It is a value of the 


ОРНОК ЧҮҮ d 


tials 


Sec. 14.4] Transformalions: Their Nature and Purpose 215 


variable below which Ё per cent of individuals lie. On an examination, if 
85 per cent of individuals score less than 60, then 60 is the 85th percentile 
point. If a frequency distribution is represented graphically and ordinates 
raised at all percentile points, the total area under the frequency distribution 
is divided into 100 equal parts. 

Percentile points may be represented by the symbols Po, Рз, Ps, . . . , P100. 
The points Ре and Ру are limits which include all members of the sample. 
A percentile rank, as distinct from a percentile point, is a value on the 
transformed scale corresponding to the percentile point. 1f 60 is a score 
below which 85 per cent of individuals fall, then 85 is the corresponding 
percentile rank. As in all transformations, values on the original scale 
correspond to certain values on the transformed scale. In the present context 
the values on the original scale are percentile points, the corresponding values 
on the transformed scale are percentile ranks. 

The reader will recall that the median is a value of the variable above and 
below which 50 per cent of cases lie. The median is the 50th percentile point, 
Ps. The upper quartile is a value of the variable above which 25 per 
cent of cases and below which 75 per cent of cases lie; conversely for the 
lower quartile. The upper quartile is the 75th percentile point, or Prs, and 
the lower quartile is the 25th percentile point, or P». Decile points are 
sometimes used. These, as the name implies, involve a dividing into tenths. 
A decile point is a value of the variable below which a certain percentage of 
individuals fall, the percentage being taken in units of 10.  Decile ranks are 
transformed values corresponding to the decile points and taking the integer 
values 1 to 10. The median is the 5th decile. An ordinate at the median 
divides the area under the frequency distribution into 2 equal parts; ordinates 
at the upper quartile, the median, and the lower quartile divide the area into 
4 equal parts; ordinates at the decile points divide the area into 10 equal 
parts; ordinates at the percentile points divide the area into 100 equal parts. 

For small N the computation of percentile points and percentile ranks is 
not a very meaningful procedure. Given the scores 8, 17, 23, 42, 61, and 
63, obviously little meaning could possibly attach to Pao or Ps. The con- 
version of these scores to percentile ranks would be a somewhat spurious 
procedure, with no advantage over ordinary ranks. 


14.4. Computation of Percentile Points and Ranks—Ungrouped 
Data 


To illustrate the computation of percentile points and ranks for ungrouped 
data, consider the psychological-test scores tabulated in Table 14.1. We 
adopt the conyention that any score value X has exact limits given by 
X — .5 and X + .5. The variable is presumed to be continuous. Thus the 
score 116 has exact limits 115.5 and 116.5. This convention is the same as 


216 Statistical Analysis in Psychology and Education [Снлр. 14 


TABLE 14.1 
PSYCHOLOGICAL Test SCORES ror A GROUP or 60 CHILDREN ARRANGED IN ORDER 
Individual Score Individual Score Individual Score 

1 83 21 110 41 123 
2 88 22 110 42 124 
3 88 23 110 43 124 
4 91 24 110 44 125 
5 91 25 111 45 125 
6 93 26 112 46 125 
7 93 27 114 47 126 
8 93 28 115 48 126 
9 97 29 116 49 127 
10 98 30 116 50 128 
11 98 31 116 51 130 
12 98 32 117 52 130 
13 100 33 118 53 131 
14 101 34 119 54 132 
15 103 35 120 55 135 
16 107 36 121 56 135 
17 107 37 122 57 136 
18 108 38 123 58 136 
19 109 39 123 59 136 
20 110 40 123 60 139 


that used in determining the exact limits of class intervals. Let us now 
calculate Ро, the 40th percentile point, the point below which 40 per cent of 
individuals lie. N = 60, and 40 per cent of this is 24, The 24th individual 
has a score of 110, the exact upper limit is 110.5, and this is taken as the point 
on the test scale below which 24 individuals lie. Thus Py) = 110.5. Note 
in this case that the 25th individual has a score of 111. The exact lower 
limit of this score is also 110.5. Consider now the calculation of Ps. We 
require a point on the test scale below which 12 and above which 48 indi- 
viduals lie. The score of the 12th individual is 98 with an upper limit of 
98.5. We note also that the score of the 13th individual is 100 with a lower 
limit of 99.5. Presumably the median falls somewhere between 98.5 and 
99.5. Itisindeterminate. As an arbitrary working procedure the percentile 
Р may be taken halfway between these two values. Thus Р = 99.0. 
To illustrate the handling of ties in the computation of percentile points let 
us calculate Pio. А score is required below which 6 and above which 54 
individuals fall. We note that individuals 6, 7, and 8 have the same score, 
93. Thus three individuals have scores within t:e exact limits 92.5 and 93.5. 
Since we require a point below which 6 individuals fall, we interpolate one- 
third of the way into this interval. One-third of this interval is .33, and 
Pio = 92.50 + .33 = 92,83. With the above data P, may be taken as the 


4 


* 


Sec. 14.5] Transformations: Their Nature and Purpose 217 


lower exact limit of the lowest score, or 82.5. Similarly, Pioo may be taken 
as the upper exact limit of the highest score, or 139.5. 

The calculation of percentile ranks as distinct from percentile points is 
the reverse of the above process. Above we calculated scores corresponding 
to particular ranks. We may now attend to the calculation of ranks cor- 
responding to particular scores. To illustrate, consider individual 32 in 
Table 14.1. This individual is 32d from the bottom. His test score is 117. 
The number of individuals scoring below 117 is 31. The percentage below 
is 34 x 100 = 51.67. The number scoring above 117 is 28. The percentage 
is 38 X 100 — 46.67. These two percentages do not add to 100. Indi- 
vidual 32 occupies qy X 100 = 1.67 per cent of the total scale. His per- 
centile rank falls between 51.67 and 51.67 + 1.67 = 53.33. We may take 
the mid-point of this interval as the required percentile rank. Thus the 
percentile rank corresponding to score 117 is 51.67 + 1.67/2 = 52.50. This 
method assumes that any rank R covers the interval R — .5 and К + .5. 

Consider the question of ties. Wenotethatfiveindividualsscore110. The 
number of individuals scoring below 110 is 19, or 29 X 100 = 31.67 per cent of 
the total. The number scoring above 110 is 36, or $$ X 100 = 60.00 per cent. 
The number occupying the score position 110 is 5, or gy X 100 = 8.33 per 
cent. The required percentile rank may be taken as the mid-point of the 
interval 31.67, and 31.67 + 8.33 = 40.00. Thus the percentile rank of the 
score 110 is 31.67 + 8.33/2 = 35.83. 

Percentile ranks may be obtained by using the simple formula 


PR = 100 Š (14.1) 


where R = rank of individual, counting from the bottom 

N = total number of cases 
Where ties occur, R is taken as the average rank which the tied observations 
occupy. The average rank of the five individuals who score 110 is 22, and 
the corresponding percentile rank is, as before, 100(22 — .5)/60 = 35.83. 
Percentile ranks are ordinarily rounded to the nearest whole number. Thus 
the rank 35.83 becomes 36. 


14.5. Calculation of Percentile Points and Ranks—Grouped Data 


The calculation of percentile points and ranks for grouped data will be 
discussed with reference to the data of Table 14.2. Cumulative frequencies 
are recorded in col. 3, and cumulative percentages in col. 4. 

Let us calculate Р. N = 200, and 25 per cent of N is 50. We observe 
that the 50th case falls within the interval 65 to 69. The exact limits of this 
interval are 64.5 and 69.5. We must now interpolate within the interval to 
locate a point below which 50 cases fall. We note that 36 cases fall below 


218 Statistical Analysis in Psychology and Education |Снлр. 14 


TABLE 14.2 
CUMULATIVE FREQUENCIES AND PERCENTAGES OF Test SCORES 
(0 (2) (3) (4) 

Class Е Cumulative | Cumulative 
interval | dUency frequency | percentage 
95-99 1 200 100.0 
90-94 6 199 99.5 
85-89 8 193 96.5 
80-84 33 185 92.5 
15-19 40 152 76.0 
70-74 50 112 56.0 
65-69 26 62 31.0 
60-64 14 36 18.0 
55-59 10 22 11.0 
50-54 6 12 6.0 
45-49 4 6 3.0 
40-44 Eh 2 1.0 

Total.... 200 


and 26 cases within the interval containing Р». То arrive at the 50th case 
we require 14 of the cases within the interval. Thus we take } of the interval 
64.5 to 69.5. This is $$ X 5 = 2.69. We add this to the lower limit of the 
interval to obtain Pss, which is 64.5 + 2.69 = 67.19, 
The following formula may be used to calculate percentile points: 
Pe b+ xs (14.2) 
í 

where P; = kth percentile point 

p = proportion corresponding to ith percentile point; thus if = 62, 

p= 42 

L = exact lower limit of interval containing Р, 

F = sum of all frequencies below L 

Ji = frequency of interval containing Р, 

h = class interval 
For Р» in Table 14.2 we have L = 64.5, p, = .25, F = 36, f, = 26, and 
А = 5. Thus 
25 X 200 — 3 


6 Б 
26 = × 5 = 67.19 


Р» = 64.5 + 
This result is identical with that obtained previously for Pa. The reader 
will observe that for P this formula is the same as that given previously for 
calculating the median from grouped data. 


Sec. 14.5] Transformations: Their Nature and Purpose 219 


The calculation of percentile ranks is the reverse of the above procedure. 
The cumulative percentages shown in col. 4 of Table 14.2 are the percentile 
ranks corresponding to the exact top limits of the intervals. Thus 56.0 is 
the percentile rank corresponding to the percentile point 74.5, the exact top 
limit of the interval 70 to 74. Likewise 11.0 is the percentile rank correspond- 
ing to the percentile point 59.5, the exact top limit of the interval 55 to 59. 
The percentile rank of any score may be obtained by interpolation. What is 
the percentile rank corresponding to the score 81? The score 81 falls within 
an interval with exact limits 79.5 and 84.5. It is 1.5 score units above the 
bottom of this interval. The lower limit has a percentile rank of 76.0, and 
the upper limit 92.5. Thus we have two points on the score scale correspond- 
ing to two points on the percentile-rank scale. Five units on the score scale 
is equal to 92.5 — 76.0 = 16.5 units on the percentile-rank scale, and 1.5 
units on the score scale is equal to (92.5 — 76.0)1.5/5 = 4.95 units on the 
rank scale. We now take 76.0 + 4.95 = 80.95 as the percentile rank of the 
score 81. Rounding this to the nearest integer we obtain'a rank of 81. It is 
pure coincidence that in this case the percentile rank is numerically equal to 
the score. 

The steps involved in finding percentile ranks from grouped data may be 
summarized as follows: 

1. Find the exact lower limit of the interval containing the score X whose 
percentile rank is required. 

2. Find the difference between X and the lower limit of the interval 
containing it. 

3. Divide this by the class interval and multiply by the percentage within 
the interval. 

4. Add this to the percentile rank corresponding to the bottom of the 
interval. 

Usually, where percentile ranks are calculated we are interested in prepar- 
ing a table for converting any score value to percentile ranks. Thus for 
every possible score, we require the corresponding percentile rank. ‘This 
may be done by systematically computing all percentile-rank values in the 
manner described above. A somewhat easier procedure is to make a graphi- 
cal plotting on suitable graph paper of cumulative percentages against the 
corresponding upper limits of the class intervals. Score values are plotted 
on the horizontal axis, and cumulative percentages on the vertical axis. 
The points may be joined by straight lines. Percentile ranks corresponding 
to scores may then be read directly from the graph. И the points are joined 
by straight lines, these rank values will be the same, within limits of error, 
as those obtained by linear interpolation directly on the numerical values. 
If the sample is small, the points when plotted may show considerable 
irregularity and it may be advisable to fit a smoothed curve to the data. 
The fitting of a smoothed curve by freehand methods is accurate enough for 


220 Statistical Analysis in Psychology and Education [Cuar. 14 


most practical purposes. A procedure related to the method described above 
is to calculate certain selected percentile points and then interpolate either 
numerically or graphically between these points. The percentile points 
Pio, Poo, P30, . . . , Poo may be calculated. То achieve greater accuracy at 
the tails of the distribution it may be desirable to calculate P, and 25; also 
Р 95 апа P 99. 


14.6. Normal Transformations 


The transformation of a variable to the normal form is a frequent procedure 
in test standardization and correlational analysis. Not uncommonly, test 
norms are normal transformations of the original raw scores with arbitrarily 
selected means and standard deviations. A type of normal transformation 
used by educationists is a T score. Т scores are normally distributed, usually 
with a mean of 50 and a standard deviation of 10. A normal transformation 
with a mean of 100 and a standard deviation of 15, or thereabouts, resembles 
an IQ scale. 


TABLE 14.3 
POINTS ON THE BASE LINE OF THE Unit NORMAL CURVE CORRESPONDING TO 
SELECTED PERCENTILE RANKS 


Percentile Standard 
rank deviation 
99 +2.33 
95 +1.64 
90 +1.28 
80 +0.84 
70 +0.52 
60 +0.25 
50 0.00 
40 —0.25 
30 —0.52 
20 —0.84 
10 —1.28 
5 —1.64 
1 —2.33 


Transforming a set of scores to the normal form is a relatively simple 
procedure. Every percentile rank corresponds to a point on the base line 
of the unit normal curve measured from a mean of zero in standard deviation 
units. A percentile rank of 50 corresponds to the zero point. A rank of 
60 is .25 standard deviation units above the mean. A rank of 70 is .52 
standard deviation units above the mean. Table 14.3 shows points on the 
base line of the unit normal curve corresponding to selected percentile ranks. 


Sec. 14.6] Transformations: Their Nalure and Purpose 221 


These and other points are readily obtained from any table of areas under 
the normal curve (Table A of the Appendix). 

In summary, the steps used in transforming a variable to the normal form 
are as follows. Percentile ranks corresponding to certain points on the 
score scale may be calculated. A table of areas under the normal curve is 
used to find the points on the base line of the unit normal curve corresponding 
to these percentile ranks. These points correspond to the percentile points 
on the original score scale. Thus a correspondence is established between 
a set of points on the original score scale and points on a normal distribution 
of zero mean and unit standard deviation. Percentile ranks are stepping- 
stones in establishing this correspondence, The normal standard scores are 
multiplied by a constant to obtain any desired standard deviation of the 
transformed values. A constant is usually added to produce a change in 
means, thus eliminating negative signs. A transformed value corresponding 
to any score value on the original scale may be obtained by interpolation. 

Some freedom of choice is possible in the selection of a set of points on the 
score scale with associated percentile ranks. First, we may use the exact 
top limits of the intervals and obtain the corresponding percentile ranks from 
the cumulative-percentage frequencies. Second, we may take the mid-points 
of the class intervals and obtain percentile ranks corresponding to these. 
Third, we may use a selected set of percentile points with associated percentile 
ranks. Thus Py, Pro, Рз, oe S's Poo may be used. Pi, Ру, апа Ps, Ps 
may be added at the tails as a refinement. Fourth, we may select certain 
equally spaced points on the normal standard-score scale and ascertain their 
percentile ranks and the corresponding percentile-point scores. These 
equally spaced points may, for example, be — 2.5, — 2.0, —1.5,..., +1.5, 
+2.0, +2.5. The difference between the four alternatives outlined above is 
a matter of units. The first uses units of class interval of the original 
variable, a unit extending from the top of one interval to the top of the next. 
The second also uses units of class interval of the original variable, a unit 
extending from the mid-point of one interval to the mid-point of the next. 
The third, excluding the tails, uses equal units on the percentile-rank scale. 
The fourth alternative uses equal units on the normal standard-score scale. 
While minor advantages may be claimed for one procedure in preference to 
another, the differences, where № is fairly large, are trivial. Any one of the 
four procedures is satisfactory enough for most practical purposes. 

To illustrate the transformation of a set of scores to the normal form we 
shall use the second alternative and take the mid-points of the class intervals 
with their corresponding percentile ranks. Table 14.4 shows a frequency 
distribution of test scores. Column 2 shows the exact mid-points of the class 
intervals. Column 3 shows the frequencies. Column 4 shows the cumula- 
tive frequencies to the mid-points. These are the cumulative frequencies to 
the bottom of the interval plus half the frequencies within the interval. 


222 Statistical Analysis in Psychology and Educalion | CHA». 14 


The number of cases below the interval 60 to 64 is 22. The number within 
the interval is 14. Half this number is 7. Тһе cumulative frequency to the 
mid-point is 22-- 7 — 29. Column 5 shows the cumulative percentage 
frequencies to the mid-points. These cumulative percentage frequencies 
are percentile ranks corresponding to the mid-points of the intervals. The 
numbers in col. 6 are points on the base line of the unit normal curve in 
standard deviation units from a zero mean. The percentage of the area of 
the unit normal curve falling below a standard score of 2.81 is 99.75, the 
percentage below a standard score of 2.06 is 98.00, and so on. These values 
are normalized standard scores corresponding to the mid-points of the original 
score intervals. "Thus we have a set of values on the original scale paired 
with a set of values on a normal transformed scale. Transformed values 
corresponding to any score on the original scale may be obtained by either 
arithmetical or graphical interpolation. 


TABLE 14.4 
ILLUSTRATION OF THE TRANSFORMATION ОР ScORES TO A NORMAL DISTRIBUTION— 
Data or TABLE 14.2 


Cumulative| Cumulative| Normal T score 


f Class Mid- Frequency frequency percentage | standard - it 
interval | point to mid- to mid- | deviation |, « 19 |; x 10 + 50 
point point unit 2 


( | Q (3) (4) (5) (6) (7) (8) 


Table 14.4 shows a T-score transformation. In col. 7 the standard scores 
of col. 6 are multiplied by 10, thus yielding transformed scores with a standard 
deviation of 10, In col. 8a constant value 50 is added to the values of col. 7 
thus changing the origin from zero to 50 and eliminating negative values. 
If we had multiplied by 15 and added 100, the transformed values would 


, 


Sec. 14.8] Transformalions: Their Nalure and Purpose 223 


have a standard deviation of 15 and a mean of 100. Any other standard 
deviation and mean could be used. 


14.7. The Stanine Scale 


During World War II the United States Army Air Force Aviation Psy- 
chology Program used a stanine scale. Scores on psychological tests were 
converted to stanines. A stanine scale is an approximately normal trans- 
formation. A coarse grouping is used, only nine score categories being 
allowed. The transformed values are assigned the integers 1 to 9. The 
mean of a stanine scale is 5, and the standard deviation is 1.96. The per- 
centage of cases in the stanine-score categories from 1 to 9 are 4, 7, 12, 17, 20, 
17, 12, 7, and 4. Thus 4 per cent have a stanine score 1, 7 per cent a score 
2, 12 per cent a score 3, and so on. If a set of scores is ordered from the 
lowest to the highest, the lowest 4 per cent assigned a score 1, the next lowest 
a score 2, the next lowest a score 3, and the process continued until the top 
4 per cent receives a score of 9, the transformed scores are roughly normal 
and form a stanine scale. Stanine scores correspond to equal intervals in 
standard deviation units on the base line of the unit normal curve. А 
stanine of 5 covers the interval from —.25 to +.25 in standard deviation 
units. Roughly 20 per cent of the area of the unit normal curve falls within 
this interval. A stanine of 6 covers the interval +.25 to +.75 in standard 
deviation units. Roughly 17 per cent of the area of the unit normal curve 
falls within this interval. The interval used is one-half a standard deviation 
unit; a stanine of 9 includes all cases above +2.25, and a stanine of 1 all 
cases below — 2.25 standard deviation units. Test scores can rapidly be 
converted to stanines. A stanine transformation is a simple method of 
converting scores to an approximate normal form. The grouping, although 
coarse, is sufficiently refined for many practical purposes. 


14.8. Regression Transformations 


The data resulting from certain psychological experiments are comprised 
of a set of initial measurements, obtained in the absence of an experimental 
treatment, and a set of subsequent measurements obtained on the same sub- 
jects in the presence of an experimental treatment. These latter measure- 
ments are a function both of the initial measurements and the effects of 
the experimental treatment. The investigator may wish to transform the 
measurements obtained under the treatment to a new variable which is 
independent of the initial measurements, the transformed variable being the 
object of further analysis. To illustrate, measures of motor performance 
may be obtained both in the absence and the presence of a stress agent. 
The scores obtained under stress conditions are not independent of the 


224 Statistical Analysis in Psychology and Education [Снлр. 14 


initial scores. A person may have a low score under stress because his 
initial level of motor performance is low, or he may have a high score because 
his initial level is high, quite apart from the effects of the stress agent. We 
require a transformation that removes the effect of the initial values. The 
variation in the transformed measurements is presumably the result of the 
stress agent, the effects of initial level of performance being removed. 

Various approaches to this problem have been used. Some investigators 
have employed difference scores, the presumption here being that the 
increase or decrease in score over the initial value must result from the 
experimental treatment. Other investigators have used ratio scores. These 
methods do not achieve independence with respect to initial values. A 
straightforward approach to this problem is to remove the effects of initial 
values by simple linear regression, assuming of course that a linear-regression 
model is appropriate to the data. 

Let Xo and X, be scores obtained under the two conditions. Let 20 and 
2, be the corresponding standard scores. The regression equation for pre- 
dicting z; from zo is 2, = одо, where ro; is the correlation between measures 
obtained under the two conditions, and z is a standard score predicted from 
the initial values. The values 2, are points on the regression line used in 
predicting zı from zo. The difference between z, and 2} is a deviation from 
the regression line and may be written as zı — rozo. These deviations are 
transformed values which are quite independent of the initial values. The 
effect of initial performance level has been removed. The variation in the 
transformed values results from the experimental condition plus error. Of 
course, in any practical situation the data may be contaminated by other 
factors unless adequate controls are exercised. 

"The scores zı — год are errors of estimation with zero mean and a standard 
deviation given by 4/1 — ro. They may be expressed in standard-score 
form by writing 


ie Пе 4.3 
VA = ты? ч) 


In this form they may be referred to as à scores, or delta scores. These 
transformed scores have a mean of zero and a standard deviation of unity. 
Their skewness and kurtosis are not a simple matter. Such scores may be 
multiplied by a constant to obtain any desired standard deviation. Any 
constant may be added to change the mean. 

This type of simple regression transformation is quite general and is 
applicable in many situations where we wish to remove the effects of one 
variable on another. It has been used effectively by Lacey (1956) in the 
statistical treatment of autonomic-response data. 


Sec. 14.9] Transformations: Their Nature and Purpose 225 


14.9. Transformations with Age Allowances 


Any detailed consideration of a score transformation with age allowances 
is beyond the scope of this book. A few comments may, however, be appro- 
priate. This transformation is a variant of the regression transformation 
described in the previous section. Its purpose is to achieve comparability 
between children of different ages by transforming to a variable which is 
independent of chronological age. An older child A may have greater 
ability than a younger child B. Relative to his age group, however, his 
ability may be appreciably less. We require an answer to the question, 
how would child A compare with child B if both were the same age? This 
question is answered by a transformation to a variable which is independent 
of chronological age. Age transformations usually incorporate a normalizing 
process. The transformed scores are normally distributed with a fixed mean 
and standard deviation. 

Such a transformation may be effected in a variety of ways. One method 
involves the following general steps. Obtain the frequency distributions 
of scores for each age group. If the age group is restricted to 1 year, say 11, 
a frequency distribution may be prepared for each month of age, 11 years 
0 months, 11 years 1 month, 11 years 2 months, and so on. For a group 
covering a wider age range, 3- or 6-month intervals may be used. The next 
step is to compute certain selected percentile points for each frequency 
distribution. Let us calculate the 5th, 16th, 50th, 84th, and 95th percentile 
points. These percentiles correspond roughly to the points on the base line 
of the unit normal curve of —1.65, —1.00, 0.00, +1.00, and +1.65. If 
greater accuracy is required, additional percentile points may be calculated. 
We thus have 5 percentile points for each age group. We may now fit lines 
to these percentile points, using either mathematical or graphical methods. 
Thus we fit a line to all the 5th-percentile points, another line to the 16th- 
percentile points, another line to the 50th-percentile points, and so on. 
These lines describe the increase in score with increase in age at each per- 
centile-rank level? For a fairly narrow age range а straight line may prove 
an adequate fit to the data. Such a line may be fitted by the method of least 
squares. For a broad age range the lines may exhibit certain curvilinear 
properties and it may be advisable to fit a smooth curve to the points using 
graphical methods. These percentile lines smooth out irregularities in the 
data. The original percentile points are replaced by points on these lines. 
Let us now assume that we require a transformed variable with a mean of 
100 and a standard deviation of 15. All percentile points on the 50th- 
percentile line correspond to a score of 100 on the transformed variable. АП 
percentile points on the 84th-percentile line correspond to a score of 115 on 
the transformed variable. All points on the 95th-percentile line correspond 


226 Slalislical Analysis in Psychology and Education [Cmar. 14 


to a score of 125 on the transformed variable. Points on the 5th- and 16th- 
percentile lines correspond to scores of 75 and 85 on the transformed variable. 
Thus for each age group we have a set of percentile points, points on a fitted 
line, and a corresponding set of transformed values. By interpolation and 
extrapolation a transformed value corresponding to each original score value 
may be obtained and a conversion table prepared. 

Transformed scores obtained by this general method will be approximately 
normal with a mean of 100 and a standard deviation of 15. Any other 
appropriate mean and standard deviation may be used. The transformed 
scores are independent of age. The correlation between chronological age 
and transformed score is about zero. 

Many variants and refinements of this general method may be applied. 
Many investigators may prefer to use a larger number of percentile lines and 
equal standard-score units. 


EXERCISES 


1. What characteristics of a set of measurements are invariant under (a) a transformation 
to standard scores, (b) a normal transformation, (c) a regression transformation? 

2. State the difference between percentile points and percentile ranks. 

3. For the data of Table 14.1, compute (a) percentile points Pas, Рао, and Ps, (b) percentile 
ranks for scores 103, 123, and 136. 

4. For the data of Table 14.2, compute (a) percentile points Pio, Pao, and Ps, (b) percentile 
ranks for the scores 59, 74, and 82. 

5. Develop a T-score transformation for the data of Table 3.1. 

6. Develop a stanine transformation for the data of Table 3.1. 

7. The following are measures of motor skill under initial nonstress conditions and sub- 
sequent stress conditions for a sample of 12 individuals. 

Nonstress: 26 33 41 53 28 36 44 28 5 47 5 3 

Stress: 18 29 52 40 25 30 38 50 41 39 50 45 


Apply a regression transformation to these data. What purpose would such a trans- 
formation serve? 


СНАРТЕВ 15 


ANALYSIS OF VARIANCE: 
ONE-WAY CLASSIFICATION 


15.1. Its Nature and Purpose 


The analysis of variance is a technique for dividing the variation observed 
in experimental data into different parts, each part assignable to a known 
source, cause, or factor. We may assess the relative magnitude of variation 
resulting from different sources and ascertain whether a particular part of 
the variation is greater than expectation under the null hypothesis. The 
analysis of variance is inextricably associated with the design of experiments. 
Obviously, if we are to relate different parts of the variation to particular 
causal circumstances, experiments must be designed to permit this to occur 
in a logically rigorous fashion. 

The partitioning of variance is а common occurrence in statistics. The 
particular body of technology known as the analysis of variance was developed 
by R. A. Fisher and reported by him in 1923. Since that time it has found 
wide application in many areas of experimentation. Its early applications 
were in the field of agriculture. If the variance is understood as the square 
of the standard deviation of a variable X, 5,2, the analysis of variance does 
not in fact divide this variance into additive parts. The method divides 
the sum of squares Z(X — X)? into additive parts. These are used in the 
application of tests of significance to the data. 

In its simplest form the analysis of variance is used to test the significance 
of the differences between the means of a number of different samples. We 
may wish to test the effects of k treatments. These may be different methods 
of memorizing nonsense syllables, different methods of instruction, or dif- 
ferent dosages of a drug. А different treatment is applied to each of the k 
samples, each sample being comprised of m members. Members are assigned 
to treatments at random. ‘The means of the k samples are calculated. The 
null hypothesis is formulated that the samples are drawn from populations 
having the same mean. Assuming that the treatments applied are having 
no effect, some variation due to sampling fluctuation is expected between 
means. If the variation cannot reasonably be attributed to sampling error, 
we reject the null hypothesis and accept the alternative hypothesis that the 


997 


228 Statistical Analysis in Psychology and Education [Снлр. 15 


treatments applied are having an effect. With only two means, Ё = 2, this 
approach leads to the same result as that obtained from the / test for the 
significance of the difference between means for independent samples. 

Consider an agricultural experiment undertaken to compare yields of four 
varieties of wheat. Thirty-two experimental plots are prepared, and each 
of the four varieties grown in eight plots. Thus k = 4 and n = 8. Assume 
that appropriate precautions have been exercised to randomize uncontrolled 
factors such as variation in soil fertility from plot to plot. The yield for each 
plot is obtained, and the mean yield for each variety on the eight plots calcu- 
lated. Differences in yield reflect themselves in the variation in the four 
means. If this variation is small and can be explained by sampling error, 
the investigator has no grounds for rejecting the null hypothesis that no 
difference exists between the yields of the four varieties. If the variation 
between means is not small and of such magnitude that it could arise in 
random sampling in less than 1 or 5 per cent of cases, then the evidence is 
sufficient to warrant rejection of the null hypothesis and acceptance of the 
alternative hypothesis that the varieties differ in yield. 

In the above agricultural experiment the sampling unit is the plot. In 
psychological experimentation the analogue of the plot is usually either a 
human subject or an experimental animal. In an experiment on the relative 
efficacy of four different methods of memorizing nonsense syllables, four 
groups of subjects may be selected, a different method used on each group, 
and means on a measure of recall obtained for the four groups. Acomparison 
of these means provides information on the relative efficacy of the different 
methods, and the analysis of variance may be used to decide whether the 
variation between means is greater than sampling fluctuation would allow. 

The problem of testing the significance of the differences between a number 
of means results from experiments designed to study the variation in a 
dependent variable with variation in an independent variable. The inde- 
pendent variable may be varieties of wheat, methods of memorizing nonsense 
syllables, or different environmental conditions. The dependent variable 
may be crop yield, number of nonsense syllables recalled, or number of 
errors made by an animal in running a maze. Experiments which employ 
one independent variable are said to involve one basis of classification. The 
analysis of variance may be used in the analysis of data resulting from experi- 
ments which involve more than one basis of classification. For example, an 
experiment may be designed to permit the study both of varieties of wheat 
and types of fertilizer on crop yield. This experiment employs two inde- 
pendent variables. We wish to discover how crop yield depends on these 
two variables. The analysis of variance may be used to extract a part of 
the total variation resulting from the differences in varieties of wheat and 
another part resulting from differences in fertilizers, in addition to interaction 
and error components. A further example is a psychological experiment 


Sec. 15.2] Analysis of Variance: One-way Classification 229 


designed to permit the study of the effects of both free-versus-restricted 
environment and early-versus-late blindness on maze performance in the rat. 
Here we have two independent variables. Each variable has two categories. 
There are four combinations of conditions; free environment and early 
blindness, free environment and late blindness, restricted environment and 
early blindness, restricted environment and late blindness. Four groups of 
experimental animals may be used, and one of the four conditions applied to 
each group. The analysis of variance may be applied to identify parts of 
the variation in maze performance assignable to the different environmental 
and blindness conditions, and other parts as well. Experiments may be 
designed to permit the simultaneous study of any number of experimental 
variables within practical limits. 

Let us proceed by considering in detail the simple case of a one-way- 
classification problem where the analysis of variance provides a composite 
test of the significance of the difference between a set of means. 


15.2. Notation for One-way Analysis of Variance 


Consider an experiment involving & experimental treatments. The 
treatments may be different dosages of a drug, different methods of memoriz- 
ing nonsense syllables, or different environmental variations in the rearing 
of experimental animals. Each treatment is applied to a different experi- 
mental group. Denote the number of members in the k groups by 71, #2, 
. . . ут The number of members in the jth group is nj. The total number 
of members in all groups combined is м + 2+ ° * ° +m = N. When 
the groups аге of equal size we may write m = лз = ‘`` = 1 = m— N/k. 
The data may be represented as follows: 


Group 1 Group 2 Group k 
Xn Xu Xu 
Xn Xn Xx 
Xn Xn Xa 
Xm Xm X nk 
ni п? nk 
b Xa Ñ Ха > Xir 
i=l i=l i=l 
іама ee سل‎ 


Here a system of double subscripts is used. The first subscript identifies 
the member of the group; the second identifies the group. Thus Xs, repre- 
sents the measurement for the second member of the first group, X» repre- 


230 Statistical Analysis in Psychology and Education [Снхлр. 15 


sents the measurement for the third member of the second group, and so on. 
In general, the symbol Xj; means the ith member of the jth group. Where 
the data for each group are tabulated in a separate column, the first subscript 
identifies the row and the second the column. The sum of measurements 


ni пз nk 
in the k groups are represented by 2 Xa, b US TRE у Ха. 
i=1 =1 i=l 
We may denote the group means by X.,, X.2,..., X. The symbol 
X., refers to the mean of the first column, Х. the mean of the second column, 
and X.; the mean of the jth column. The convention is to use a dot to 
indicate the variable subscript over which the summation extends. The 
mean of all the observations taken together may be represented by the 
symbol X.., sometimes called the grand mean. In a one-way classification 
the meaning associated with the various symbols is quite clear without the 
use of the dot notation. In discussion of one-way classification we shall 
therefore simplify the notation and represent the group means by X;, Xs, 
. « « , X, and the grand mean by X. The dot notation is necessary in the 
more complex applications of the analysis of variance, and we shall return to 
it in Chap. 16. 
The total variation in the data is represented by the sum of squares of 
deviations of all the observations from the grand mean. The sum of squares 
of deviations of the л; observations in the first group from the grand mean is 


ў (X4 — X)? 
tel 


and the sum of squares of the n; observations in the jth group from the grand 
mean is 


ў (Ху — Xy 
i=l 


For k groups each comprised of n; observations the total sum of squares of 
deviations about X is 


k n 
À E ah 3 и 


When the meaning is clearly understood from the context, it is common 
practice to represent this total sum of squares by X(X;;— $F), or more 


simply by X(X — X): 
15.3. Partitioning the Sum of Squares 


Simple algebra may be used to demonstrate that the total sum of squares 
may be divided into two additive and independent parts, a within-group 


Sec. 15.4] Analysis of Variance: One-way Classification 231 


sum of squares and a between-group sum of squares. We proceed by writing 
the identity 
(X4 — X) = (Kg — X) + (Х,— X) 

This identity states that the deviation of a particular score from the grand 
mean is comprised of two parts, a deviation from the mean of the group to 
which the score belongs (Xi; — Xj) and a deviation of the group mean from 
the grand mean (X; — X). We square this identity and sum over the 2; 
cases in the jth group to obtain 


n nj nj 
Ў оа ) Gs Ratt Y (Fy Жу 
i i=1 e 
nj 
+20, 3) Xu- X) 
i 
The second term to the right requires the summation of a constant (X; — X)? 
over all n; values of the jth group and may be written ,(X;— X)*. The 
third term to the right disappears because the sum of squares of deviations 
about the mean X; is zero. We obtain thereby Р 


nj ni 

y ES y (Xy — X)* + n8; – Xy 

i=l ici 
This expression says that the sum of the squares of the deviations of the 2; 
observations in the jth group from the grand mean X is equal to the sum of 
squares of deviations of the observations from the group mean plus лу times 
the square of the difference between the group mean and the grand mean. 
We now sum over the Ё groups to obtain 


k ni k ni k 
J J (xo = Y ) оо у aR- Ху 05-1) 
j=1 i=l j=li=l sa 

The term to the left is the total sum of squares: the sum of squares of all 
the observations from the grand mean X. The first term to the right is the 
sum of squares within groups: the sum of squares of deviations from the 
respective group means. The second and last term to the right is the sum 
of squares between groups: the sum of squares of deviations of the group 
means from the grand mean, each term (X; — X)? being weighted by nj, the 
number of cases in the group. Thus the total sum of squares is partitioned 
into two additive parts, a sum of squares within groups, and a sum of squares 
between groups. These two parts are independent. 


15.4. The Variance Estimates or Mean Squares 


Each sum of squares has an associated number of degrees of freedom. 
The total number of observations is m+ 2+ ^c +m = Inj = М. 


232 Slalislical Analysis in Psychology and Education [Снлр. 15 


'The total sum of squares has N — 1 degrees of freedom. Опе degree of 
freedom is lost by taking deviations about the grand mean. N — 1 of these 
deviations are free to vary. The number of degrees of freedom associated 
with the within-groups sum of squares is 


k 
(m — 1) + (ma — 1) + °° + (m= 0) = Y n-k-N—-h 
j= 


1 


The number of degrees of freedom for each group is »; — 1. Hence the 
number of degrees of freedom for k groups is Хм; — k, or N — k. The 
number of degrees of freedom associated with the between-groups sum of 
squares is k — 1. We have k means, and 1 degree of freedom is lost by 
expressing the group means as deviations from the grand mean. The 
degrees of freedom are additive: 


N-—1=(N—k)+ (k= 1) 
total within between 


The within- and between-groups sums of squares are divided by their 
associated degrees of freedom to obtain a within-groups variance estimate 
зш? and a between-groups variance estimate sy. Thus 


j (Xy — X5? 


pru сш. (15.2) 
р 
у. ъ(Х;— X)? 

S0 =i! Fes) "S (15.3) 


The sums of squares and degrees of freedom are additive. The variance 


estimates are not additive. The variance estimate is sometimes spoken of 
as the mean square. 


15.5. The Meaning of the Variance Estimates 


What meaning attaches to the variance estimates s,* and $9? Let us 
assume that the & samples are drawn from populations having the same 
variance. The assumption is that oj = gy = +--+ = aû = од, If this 
assumption is tenable, the expected value of s,? is o*; that is, 


Е(з?) = о? 


Thus s,* is an unbiased estimate of the population variance. It is an esti- 
mate obtained by combining the data for the & samples. It may be written 


Sec. 15.5] Analysis of Variance: One-way Classification 233 


in the form 


ni ne па 
y (Xu — X? + » (Xa — X) °°° + (Xie — XR? 
$— tel ici =1 
Sam mmc m-k 059 
The reader will recall that in applying the / test to determine the significance 
of the differences between two means for independent samples, an unbiased 
estimate of the population variance was obtained by combining the sums of 
squares about the means of the two samples and dividing this by the total 
number of degrees of freedom. The within-group variance s,,” is an estimate 
of precisely the same type. It is obtained by adding together the sums of 
squares about the & sample means and dividing this by the total number of 
degrees of freedom. The variance estimate used in the ¢ test is the particular 
case of Sw? which occurs when k = 2. : 
The expected value of sẹ may be shown to be 


Y (uj — и)? ( = У iid 


Чү کے کے‎ ake toe j=l 
Els) = о? + E Кеч 


where ш; and и are population means. Under the null hypothesis 
ЖА РОД НЕ DRE 


and the second term to the right of the above expression is equal to zero. 
Hence under the null hypothesis both sw? and sj? are estimates of the popula- 
tion variance o*. 

That 52 is an estimate of о? under the null hypothesis may be illustrated 
by considering the particular situation where ж = m = : : : = m =M. 
The between-group variance estimate may then be written as 


k 


n ў (Х;,— X) 


je 


ESI 


This is » times the variance of the k means, or #s,*. The error variance of 
the sampling distribution of the arithmetic mean for samples of size s is 
given by 022 = c?/n. Hence me; = 0°. The quantity ms,’ is an estimate of 
na", hence also of a°. Thus s? is an estimate of g°. 

Where the null hypothesis is false and the means of the populations from 
which the k samples are drawn differ one from another, the second term 
to the right of the expression for Е(52) is not equal to zero. It is a measure 
of the variation of the separate population means и, from the grand mean д. 


234 Statistical Analysis in Psychology and Education |Снлр. 15 


To test the hypothesis Ho:uı = us = * * * = me, consider the ratio 
s/s&*. This is ап F ratio. Under the null hypothesis the expected value 
of this ratio is unity since E(s*) = E(s,*) = о? If the population means 
differ from each other, Е(зь?/з„?) will be greater than unity. If $*/s.? is found 
to be significantly greater than unity, this may be construed to be evidence 
for the rejection of the null hypothesis and for the acceptance of the alterna- 
tive hypothesis that differences exist between the population means. The 
significance of the F ratio s*/s,* may be assessed with reference to the table 
of F (Table D of the Appendix) with & — 1 degrees of freedom associated 
with the numerator and N — Ё degrees of freedom associated with the 
denominator. 


15.6. Computation Formulas 


The calculation of the required sums of squares may be simplified by the 
use of computation formulas. To simplify the notation, denote the sum of 
all the observations in the jth group by Tj. Thus 


» Xy = T; 


i=1 
Denote the sum of all observations in the k groups by T. Thus 


k nj 
) xe-T 


j-2li-1l 


The computation formulas are readily obtained. The formula for the total 
sum of squares is 


T (Ха 2)? = y у X – т (15.6) 


j=1 i=l j=1 i=l 


Thus we find the sum of squares of all observations and subtract T?/N. 
The within-groups sum of squares is 


D» = Y $ xe- AA (15.7) 


The quantity 7;?/n; is the square of the sum of the jth group divided by the 
number of cases in that group. These values are calculated and summed 


Sec. 15.8] Analysis of Variance: One-way Classification 235 
over the & groups. The between-groups sum of squares is 


k 
J м®,- х= у (Z)-F (15.8) 


j=l j=l 


The above formulas are generally applicable to groups of unequal or 
equal size. In the particular case where the groups are of equal size and 


nı = Ho = 5 t =m = п, the within-groups sum of squares may be 
written as 
k 
kon ў T; 
لے کج‎ (15.9) 


j=l (=1 


and the between-groups sum of squares becomes 


ae У (15.10) 


15.7. Summary 


Table 15.1 presents in summary form the formulas hitherto discussed. 

In summary, to test the significance of the difference between & means 
using the analysis of variance, the following steps are involved: 

1. Partition the total sum of squares into two components, a within-groups 
and a between-groups sum of squares, using the appropriate computation 
formulas. 

2. Divide these sums of squares by the associated number of degrees of 
freedom to obtain Sw? and 52, the within- and between-groups variance 
estimates. 

- 3. Calculate the F ratio 52/5, and refer this to the table of F (Table D of 
the Appendix). 

4. If the probability of obtaining the observed F value is small, say, less 
than .05 or .01, under the null hypothesis, reject that hypothesis. 


15.8. Illustrative Example: One-way Classification 


Table 15.2 shows the number of nonsense syllables recalled by four groups 
of subjects using four different methods of presentation. Fictitious data 
are used here for simplicity of illustration. The sums of squares have been 
calculated using the computation formulas. The data are presented in 


[Cna». 15 


Slalislical Analysis in Psychology and Educalion 


236 


LIA 


I=tt=f 
— h 
X EK С 


TROL 


UOT}IIA Jo IMOG 


o A 
т=г 
Жы гІ 1 
I-:10-6 ms 
Ф -GX 
1 
ا‎ tee ()я 
Nn < — N Jat — f) X 
1 1 
Rr o EE 
«(x = °F) T 
E] 3 
Был}: 
T=t1=f t= 
^ -ip { 
u y 1 
шәәмјәя 


ЅУТПИЧОД JO AYVAROS 
NOILVOLIISSVI) AVA-3N() :3ONVINVA 40 SISKIVNY 
TST TISVL 


$dno13 [vnbo :sepjnur10; uorjejnduio?) 


se[nu20) попе1пйшогу 


vonvi»adr3 


21venbs иптәш :3jvuiso a»uvtvA 


шорәәзу jo s221S2(q 


$21vnbs jo wng 


Sec. 15.8] Analysis of Variance: Опе-шау Classification 231 


TABLE 15.2 


CoMPUTATION FOR THE ANALYSIS OF VARIANCE; ONE-WAY CLASSIFICATION 
NuMBER OF NONSENSE SYLLABLES CORRECTLY RECALLED UNDER Four METHODS 


OF PRESENTATION 


Method 


I II ш IV 
5 9 8 1 
7 11 6 3 
6 8 9 4 
3 7 5 5 
9 7 7 1 
7 4 4 
4 4 
2 
nj 8 5 7 6 | N=2 
T; 43 42 43 18 T = 146 
Xj 5.38 8.40 6.14 3.00 T*/N = 819.85 
nj kon 
Xi? 269 364 287 68 Ху = 988 
i=l j=li=l 
k Та 
Ip 231.13 352.80 264.14 54.00 » 2: = 902.07 
i б 
ј=1 


Sum of squares 


902.07 — 819.85 = 82.22 
988 | — 902.07 = 85.93 
988  — 819.85 = 168.15 


TABLE 15.3 


ANALYSIS OF VARIANCE FOR Data Or TABLE 15.2 


Source of | Degrees of Variance 
variation Sum of squares freedom estimate 
Between.......... 82.22 27.41 = si? 
Within 85.93 3.91 = 5,3 


168.15 


Е =7.01 


238 Statistical Analysis in Psychology and Education [СнаР. 15 


summary form in Table 15.3. The number of groups is 4. The number of 
degrees of freedom associated with the between-groups sum of squares is 
k—124—1-23. The number of degrees of freedom associated with 
the within-groups sum of squares is N — k = 26 — 4 = 22. The number of 
degrees of freedom associated with the total is N — 1 = 26 — 1 = 25. The 
between and within sums of squares are divided by the associated degrees of 
freedom to obtain the variance estimates s? and sw’. 

The F ratio is s?/s,? = 27.41/3.91 = 7.01. Consulting a table of F with 
df — 3 associated with the numerator and df — 22 with the denominator, 
we find that the value of F required for significance at the .01 level is 4.82. 
We may safely conclude that the method of presentation affects the number 
of nonsense syllables recalled. 


15.9. Comparison of Means Two at a Time Following an 7 Test 


If the F test does not lead to the rejection of the null hypothesis, no further 
analysis of the data is required. When the null hypothesis is rejected, the 
investigator may wish to compare means two at a time, using а / test. The 
differences between certain pairs of means may be significant, while others 
may not be. 

In applying the £ test to compare means two at a time, the within-group 
variance estimate Sw? may be used. This estimate is based on a larger 
number of degrees of freedom than a variance estimate based on any two 
groups. In Table 15.2 the means for samples I and II are 5.38 and 8.40, 
respectively. The within-group variance, based on 22 degrees of freedom, 
is 3.91. The numbers of cases in the two groups are 8and 5. The / test is 
then 

PERTINENTIUM 
м/т + з/п — N/3.91/8 + 3.91/5 


We consult the table of £ with df = 22. The values required for significance 
at the 5 and 1 per cent levels are 2.07 and 2.82, respectively. The observed 
{ is between the 5 and 1 per cent level. 

Some doubt attaches to the above procedure. Assuming the null hypothe- 
sis, in any large number of comparisons of pairs of means, approximately 
5 per cent of differences would prove to be significant at the 5 per cent level. 
"This suggests that in applying / tests following an F test a more rigorous basis 
than usual is required for rejection of the null hypothesis. One suggestion is 
that instead of using the 5 per cent level of rejection we use the 10/k(k — 1) 
per cent level, where Ё is the number of groups. For the data of Table 15.2 
the critical level of rejection becomes the .83 per cent level. The general 
problem of applying a / test following an F test has been investigated by 
Tukey (1949). His procedures are described in detail by Edwards (1954). 


—-—— 


—— 


Sec. 15.11] Analysis of Variance: One-way Classification 239 


15.10. The Analysis of Variance with Two Groups 


With two groups only, the significance of the differences between means 
may be tested using either a £ test or the analysis of variance. These pro- 
cedures lead to the same result. Where & — 2 it may be readily shown that 
МЕ = t. 

Consider a situation where k = 2 and tı = ns = n. Under these circum- 
stances the between-groups variance estimate Sp is 


du n(X, — X)?+ «(Xs — X)? 
к a ds 

2—1 
For groups of equal size the grand mean X is halfway between the two group 
means X; and Xs. Thus (X,— £) = (X.— X) = XX, — Xy) and 
(X1,— X) 2(X,— X)? = (X, — X9?. We may therefore write 


= 5 i X 


When k = 2 the within-groups variance estimate sw? is the unbiased variance 
estimate s*, obtained by adding the two sums of squares about the means 
of the two samples and dividing by the total number of degrees of freedom 
(Sec. 15.5). Hence 


- Kı £ 

and Se =! 15.11 

VES s V(1/n) + (1/2) pesi 
Thus VF = Land F = 2. To illustrate, let nı = nz = 8. In applying the 
analysis of variance with df — 1 associated with the numerator and df — 14 
associated with the denominator of the F ratio, an F of 4.60 is required for 
significance at the .05 level. The corresponding 4 for df — 14 required for 
significance at the .05 level is 4/4.60 = 2.145. The test may be considered 
a particular case of the F test. It is a particular case which arises when 
В = 2. 

In the above discussion we have considered two groups of equal size. 
The result VF = £ is, however, quite general and holds when m and mz are 
unequal. For unequal groups the algebraic development is a bit more 
cumbersome than that given here. The grand mean does not fall midway 
between the two group means. 


15.11. Assumptions Underlying the Analysis of Variance 


In the mathematical development of the analysis of variance a number of 
assumptions are made. Questions may be raised about the nature of these 


240 Slalistical Analysis in Psychology and Education [Снхлр. 15 


assumptions and the extent to which the failure of the data to satisfy them 
leads to the drawing of invalid inferences. 

One assumption is that the distributions of the variables in the populations 
from which the samples are drawn are normal. For large samples the 
normality of the distributions may be tested using a test of goodness of fit, 
although in practice this is rarely done. When the samples are fairly small, 
it is usually not possible to rigorously demonstrate lack of normality in the 
data. Unless there is reason to suspect a fairly extreme departure from 
normality, it is probable that the conclusions drawn from the data using an 
F test will not be seriously affected. In general, the effect of departures 
from normality is to make the results appear somewhat more significant than 
they are. Consequently, where a fairly gross departure from normality 
occurs, a somewhat more rigorous level of confidence than usual may be 
employed. For а thorough discussion of this problem the reader is referred 
to Lindquist (1953). 

A further assumption in the application of the analysis of variance is that 
the variances in the populations from which the samples are drawn are equal. 
This is known as homogeneity of variance. A variety of tests of homo- 
geneity of variance may be applied. These are discussed in more advanced 
texts (Johnson, 1949). Moderate departures from homogeneity should not 
seriously affect the inferences drawn from the data. Gross departures from 
homogeneity may lead to results which are seriously in error. Under 
certain circumstances a transformation of the variable, which leads to greater 
uniformity of variance, may be used. Under other circumstances it may be 
possible to use a nonparametric procedure. 

A further assumption is that the effects of various factors on the total 
variation are additive, as distinct from, say, multiplicative. The basic 
model underlying the analysis of variance is that a given observation may be 
partitioned into independent and additive bits, each bit resulting from an 
identifiable source. In most situations there are no grounds to suspect 
the validity of this model. 

With most sets of real data the assumptions underlying the analysis of 
variance are, at best, only roughly satisfied. The raw data of experiments 
frequently do not exhibit the characteristics which the mathematical models 
require. One advantage of the analysis of variance is that reasonable 
departures from the assumptions of normality and homogeneity may occur 
ior seriously affecting the validity of the inferences drawn from the 

ata. 


EXERCISES 


1. How many degrees of freedom are associated with the variation in the data for (а) а 
comparison of two means for independent samples, each containing 20 cases, (^) a com- 
parison of four means for independent samples, each containing 14 cases, (c) a compari- 
son of four means for independent samples of size 10, 16, 18, and 11, respectively? 


Cuar. 15] Analysis of Variance: One-way Classification 241 


2. The following are error scores on a psychomotor test for four groups of subjects tested 
under four experimental conditions: 


Group Error scores | X, 


I 16 7 19 24 31 19.40 
п 24 6 15 25 32 24 29 22.14 
ш 16 15 18 19 6 13 18 | 15.00 


IV 25 19 16 17 42 45 | 27.33 


Apply the analysis of variance to test the null hypothesis Ho: ui = p: = p = Me 
3. Apply the analysis of variance to test the significance of the difference between means for 
the following data: 


I п ш 
n 10 10 10 
Xj 7.40 8.30 10.56 
n 
| 
Xy | 649 | 755 1263 


4. What assumptions underly the analysis of variance? 


CHAPTER 16 


ANALYSIS OF VARIANCE: 
TWO-WAY CLASSIFICATION 


16.1. Introduction 


Experiments may be designed to permit the simultaneous investigation of 
two experimental variables. Such experiments involve two bases of classi- 
fication. To illustrate, assume that an investigator wishes to study the 
effects of two methods of presenting nonsense syllables on recall after 5 min, 
1 hr, and 24 hr. One experimental variable is method of presentation, the 
other the interval between presentation and recall. There are six combina- 
tions of experimental conditions. One method of conducting such an experi- 
ment is to select a group of subjects and allocate these at random to the 
experimental conditions, an equal number being assigned to each condition. 
With, say, 10 subjects allocated to each experimental condition, the total 
number of subjects will be2 X 3 X 10 = 60. The data may be arranged in 
a table containing two rows and three columns. The rows correspond to 
methods, the columns to time intervals. The 10 observations for each group 
may be entered in each cell of the table. Differences in the means of the 
rows result from differences in recall under the two methods of presentation. 
Differences in the means of the columns result from differences in recall after 
the three time intervals. 

Experiments with two-way classification may be conducted with only one 
sampling unit, and measurement, for each experimental condition. With 
one measurement for each experimental condition the total sum of squares 
is partitioned into three parts, a between-rows, a between-columns, and an 
interaction sum of squares. With more than one measurement for each 
experimental condition, the total sum of squares is partitioned into four 
parts, a between-rows, a between-columns, an interaction sum of squares, 
and a within-cells sum of squares. Each sum of squares has an associated 
number of degrees of freedom. Ву dividing the sums of squares by the 
associated degrees of freedom four variance estimates are obtained. These 
variance estimates are used to test the significance of the differences between 
row means, column means, and, with more than one measurement per cell, 
the interaction effect. 

242 


pou ———————— 


Sec. 16.2] Analysis of Variance: Two-way Classification 243 


16.2. Notation for Two-way Analysis of Variance 


Consider an experiment involving R experimental treatments of one 
variable and C experimental treatments of another variable. The number of 
treatment combinations is RC. Let us consider the particular case where 
we have one sampling unit, and one measurement, for each of the RC treat- 
ment combinations. The total number of measurements is RC = N. The 
data may be represented as follows: 


1 2 jg: С mean 


Column Rap ys oF hls oce d 
mean 

Double subscripts are used. The first subscript identifies the row; the 
second subscript identifies the column. Thus X32 is the measurement in 
the third row and the second column. In general, the symbol X,. denotes 
the measurement in the rth row and cth column, where r = 1, 2,..., R 
and c = 1, 2,..., C. A dot notation is used to identify means. The 
symbol X, refers to the mean of the first row, Хз. the mean of the second 
row, and X,, the mean of the rth row. Similarly, X., refers to the mean of 
the first column, X.; the mean of the second column, and X,, the mean of 
the cth column. The grand mean, the mean of all N observations, is Жм 
Тһе total sum of squares of deviations about the grand mean is given by 


R C 
Y An- X» 
т=1с=1 


Consider now a situation where we have » sampling units and » measure- 
ments for each of the RC treatment combinations. The total number of 
measurements is RC = N. Where К = 2, C = 4, and » = 3, the data 
may be represented as follows: 


Column 
mean 


244 Statistical Analysis in Psychology and Education [Снхлр. 16 


Triple subscripts are used. The first subscript identifies the row, the 
second the column, and the third the measurement within the cell. Thus 
X23; means the measurement for the first individual in the second row and 
the third column. In general, X,,; denotes the measurement for the ith 
individual in the rth row and cth column, where i = 1,2, .. . , n. Row, 
column, and cell means are identified by a dot notation. The mean of all 
the observations in the first row is X4... The mean of all the observations 
in the rth row is X,... Similarly, the mean of the first column is X, and of 
the cth column X,,. The mean of all the observations in the cell correspond- 
ing to the rth row and cth column is X,,. The mean of all nRC observations, 
the grand mean, is Х... The total sum of squares of all observations about 
the grand mean is 


у (Xrei Eo £...) 
r=] c=] i=] 
The sum of squares of deviations about the grand mean, both where » = 1 
and л > 1, is partitioned into additive components. 


16.3. Partitioning the Sum of Squares 


With one measurement only for each of the RC treatment combinations, 
the total sum of squares may be partitioned into three additive components, 
a between-rows, a between-columns, and an interaction sum of squares. We 
proceed by writing the identity 


(Xre— FX) = (Ry. — R.) + (Re RIF — +.) 


This identity states that the deviation of an observation from the grand mean 
may be viewed as composed of three parts, a deviation of the row mean from 
the grand mean, a deviation of the column mean from the grand mean, and 
a remainder, or residual term, known as an interaction term. By squaring 
both sides of the above identity, an expression is obtained containing six 
terms. This may be summed over R rows and C columns. ‘Three of these 
terms conveniently vanish, because they contain a sum of deviations about 


a mean, which, of course, is zero, The resulting total sum of squares may 
be written as 


Ж R c 
Xu- R= Y (f, - FI R Y -RY 
2l 2. = £+ Ў &.- £» 
R с 
+) Y n-i- 
= ¿el 


The first term to the right is C times the sum of squares of deviations of row 
means from the grand mean. This is the between-rows sum of squares. 


Sec. 16.4] Analysis of Variance: Two-way Classificalion 245 


It describes the variation in row means. The second term is R times the 
sum of squares of deviations of column means from the grand mean. This 
is the between-columns sum of squares. It describes variation in column 
means. The third term is a residual, or interaction, sum of squares. The 
meaning of the interaction term is discussed in detail in Sec. 16.5. 

With » measurements for each of the RC treatment combinations the 
total sum of squares may be partitioned into four additive components. 
"These are a between-rows, a between-columns, an interaction, and a within- 
cells sum of squares. In this situation we write the identity 


== (Xe т se xm As + Xo) + (Хы эбе с) 
This expression may be squared and summed over rows, columns, and within 


cells. All but four terms vanish, and the resulting total sum of squares may 
be written as 


у D degere n yc) d. x M 


т=1 с=1ї=1 


+ mR ў (f, — X..)* п у y (Er = Х.Х...) 


с=1 т=1 с=1 


+ ) ў ў (Хы = Ж)? (16.1) 


т=1с=11ї=1 


The first term to the right is descriptive of the variation of row means, the 
second of column means, and the third of interaction. The fourth term is 
the within-cells sum of squares. It is the sum of squares of the deviations of 
observations from the means of the cells to which they belong. 


16.4. Variance Estimates or Mean Squares 


With a single entry in each cell, л = 1 and RC = N. The number 
of degrees of freedom associated with the total sum of squares is 


RC-1=N-1 


The numbers of degrees of freedom associated with row and column sums are 
R — 1 and C — 1, respectively. The number of degrees of freedom associ- 
ated with the interaction sum of squares is (R — 1)(C — 1). The degrees of 
freedom are additive, and 


N-1=(R—1)+(C—1)+ (R 1)(С— 1) 


total row column interaction 


246 Slalislical Analysis in Psychology and Education [Снхлр. 16 


The sums of squares are divided by the associated degrees of freedom to 
obtain three variance estimates, or mean squares. The between-rows, 
between-columns, and interaction variance estimates аге s,*, sẹ, and 5, 
respectively. 

With z entries in each cell, where n > 1, the total number of observations 
is КС = №. The number of degrees of freedom associated with the total 
sum of squares is ПАС — 1 = N — 1. The numbers of degrees of freedom 
associated with row, column, and interaction sums of squares аге А — 1, 
C — land (R — 1)(C — 1), respectively. The number of degrees of freedom 
associated with the within-cells sum of squares is «RC — RC = RC(n — 1). 
Because the deviations are taken about the cell means, 1 degree of freedom is 
lost for each cell. In each cell » — 1 deviations are free to vary. The 
number of degrees of freedom for RC cells, therefore, is АС(љ — 1). The 
degrees of freedom are additive. The sums of squares are divided by the 
associated degrees of freedom to obtain the variance estimates, or mean 
squares. 

Table 16.1 shows in summary form the sum of squares, degrees of freedom, 
and variance estimates for a two-way classification with » entries per cell. 


TABLE 16.1 
ANALYSIS OF VARIANCE FOR TWO-WAY CLASSIFICATION WITH п ENTRIES PER CELL: 
n>1 
Source Sum of squares df Укта 
estimate 
Romi. sire R-1 se 
Columns... . C=1 2 
, | 
Interaction, . (К = 1)(C = 1) s: 
ræ 
R 
Within cells ў (Xi 2.) RC(n — 1) Se" 
r= 
LE T a Lu 
йай», v. у ) me = 2.8 „RC ~ 1 
rele-líml 


F ratios are formed from the variance estimates and used to test the 
significance of row, column, and, where m > 1, interaction effects. The 
correct procedure here, and the interpretation of the variance estimates, 
depends on the statistical model appropriate for the experiment. Three 


| 
| 


Sec. 16.5] Analysis of Variance: Two-way Classification 247 


models may be identified: fixed, random, and mixed. The investigator must 
decide which model fits his experiment. This decision determines how the 
variance estimates are used in the application of tests of significance to the 
data. Before proceeding with a discussion of these models (Sec. 16.6), the 
meaning of the interaction term is discussed. 


16.5. The Nature of Interaction 


The algebraic partitioning of sums of squares in a two-way classification, 
where » > 1, leads to the interaction term 


в с 
n 0 bs (X4, — X,.— X. X..)? 


The nature of interaction may be illustrated by example. Consider a simple 
agricultural experiment with two varieties of wheat and two types of ferti- 
lizer. Assume that one variety of wheat has а higher yield than the other. 
If the yield is uniformly higher regardless of which fertilizer is used, then 
there is no interaction between the two experimental varieties. If, however, 
one variety produces a relatively higher yield with one type of fertilizer than 
with the other, then the two variables may be said to interact. To illustrate 
further, assume that we have two methods of teaching arithmetic and two 
teachers. Each teacher uses the two methods on separate groups of pupils. 
The achievement of the pupils is measured. If one method of instruction is 
uniformly superior or inferior regardless of which teacher uses it, then there 
is no interaction between methods and teacher. If, however, one teacher 
obtains better results with one method than the other, and the opposite 
holds for the other teacher, then teachers and methods may be said to 
interact. 
TABLE 16.2 
COMPARISON OF OBSERVED CELL MEANS AND MEANS. EXPECTED UNDER 
ZERO INTERACTION 


Observed, X,.. Expected, E( re.) 
I п Hr I п ul 


24 36 30 30 


Table 16.2 shows observed cell means for a two-way classification with 
three categories for each of the experimental variables. The observed cell 


248 Stalislical Analysis in Psychology and Education [Снлр. 16 


entries are means based on an equal number of cases. What are the expected 
cell means on the assumption of zero interaction? This situation is some- 
what analogous to the calculation of expected values for contingency tables. 
For a contingency table we calculate expected cell frequencies. Here we are 
required to calculate the expected cell means on the assumption that the 
two experimental variables function independently. 

Assuming zero interaction, certain constant differences will be maintained 
between cell means. In Table 16.2 the observed row mean for 4 is 10 points 
less than the row mean for В. If the interaction were zero, we should expect 
a constant 10-point difference to occur between means for 4 and B under 
treatments І, II, and III. A similar relationship would be expected on 
comparing all other rows and columns of this table. Obviously, the observed 
values in Table 16.2 do not exhibit this characteristic. The interaction is 
not zero. 

Where the interaction is zero, a deviation of a cell mean from the mean of 
the row (or column) to which it belongs will be equal to the deviation of its 
column (or row) mean from the grand mean. If X,, is a cell mean and xe 
and X, are its corresponding row and column means, then under zero 
interaction, X,.. = Xr. = X, — X.. Thus the expected value of X; 
under zero interaction is given by 


BRP! eT 


These expected values have been calculated for the observed data of Table 
16.2 and are shown to the right of the table. On comparing the expected 
values in any two rows or columns, note the constant increment or decrement. 
If X,,. is an observed and E(X,.) an expected value, the deviation of an 
observed from an expected value is X,,, — X,, — X, + X... The inter- 
action term in the analysis of variance is m times the sum of squares of 
deviations of the observed cell means from the expected cell means. 


16.6. Finite, Random, Fixed, and Mixed Models 


Different authors recommend different procedures for testing row, column, 
and interaction effects in a two-way analysis of variance. Difficulties 
associated with the selection of the appropriate procedure are resolved by 
the recognition of a general statistical model underlying the analysis of 
variance. This model is referred to here as the finite model. ‘Three particu- 
lar cases of the finite model may be identified. These are the random, fixed, 
and mixed models. The models appropriate for different experiments differ. 
The investigator must decide which model best represents his experiment. 
The choice of model determines the procedure for testing row, column, and 
interaction effects. The choice of model depends on the nature of the 
variables used as the basis of classification in the experimental design. 


Sec. 16.6] Analysis of Variance: Two-way Classificalion 249 


The general finite model makes the linearity assumption that a deviation 
of an observation Ху from the population value of the grand mean и may be 
expressed in the form 


Хы — B= а + be + (ab) re + еш ў (16.2) 


The four quantities to the right are in deviation form. Thus а, = Hr.. — и, 
a deviation of the population value of the row mean from the grand mean д. 
Similarly, be = u... — и, a deviation of a column mean from the grand mean. 
The interaction term (ab), = (ur. — ur. — Mec. + и), and the error term 
бш = Xrei — Hre. Where this model is used to represent experimental data 
the implicit assumption is made that treatment effects can meaningfully be 
partitioned into additive components for each sampling unit. Because 
ar, bey (ab)re, and erci are in deviation form, they sum to zero. The population 
variances of the four components are oq”, оь, Fas”, and oe. 

The null hypothesis under test, for example, for row effects is 
Нш. = шы = coco = pr. This hypothesis may be stated in the form 
H,:o.2 = 0. Similarly, the null hypotheses for column and interaction 
effects may be stated as H,:o» = 0 and Н,„:оа° = 0. We wish to obtain 
from the experimental data information which will provide a valid test of 
these hypotheses. 

We now consider an actual experiment involving R levels of one variable 
and C levels of another. The R and C levels may be regarded as samples 
drawn at random from two populations of levels comprised of Rp and Cp 
members, respectively. Thus we conceptualize two populations of levels. 
The levels used in a particular experiment are construed to be drawn at 
random from these two populations. Rp, R, Cp, and C may take any integral 
values, provided, of course, that R < R,and C < C, The RC treatment 
combinations are assigned at random to the ЁС sampling units or indi- 
viduals. Under these conditions, and given the basic linearity assumption, 
Wilk and Kempthorne (1955) have shown that the expectations of the mean 
squares for the general finite model are as shown in Table 16.3. 

TABLE 16.3 


EXPECTATION OF MEAN SQUARES FOR GENERAL FINITE MODEL: Two-way ANALYSIS 
or VARIANCE WITH m ENTRIES IN EACH CELL: # > 1 


Mean Expectation of 
square mean square 
Row, 5,%............... с? + e © паа? + NCO? 
Р 
Column, 3,2............ 9 + (Rp — R) пса? + nRa? 
Interaction, 32.......:. oe + паш? 
Within cells, 5,2... ‚6% 


Thus the mean squares provide estimates of variance components, and 
these are used to test the significance of row, column, and interaction effects. 


250 Slalislical Analysis in Psychology and Education [Снхлр. 16 


How these are used depends on a consideration of three particular instances 
of the general finite model. 

Consider an experiment involving К levels of one variable and C of another, 
these being regarded as random samples of levels from populations comprised 
of R, and C, members. We may consider a case where R, and C, are very 
large, so that R, > К and С, C, where > denotes much greater than. 
Under these circumstances such terms as (А, — R)/R, and (C, — C)/C, 
approach unity. When this is so we have what is referred to as a random 
model situation. The expectations for the random model are obtained by 
substituting (R, — R)/R, = 1 and (C, — C)/C, = 1 in the expectations of 
the mean squares for the general finite model given in Table 16.3. Thus 
the random model is a particular case of the finite model. 

In psychological research, experiments where the random model is appro- 
priate are not numerous. Satisfactory examples are not readily found. 
One example is an experiment where each member of a sample of R job 
applicants is assigned a rating by each member of a sample of C interviewers. 
Here both job applicants and interviewers may be viewed as samples drawn 
at random from populations such that R, > Капі C, > C. 

In many experiments the R levels of one variable and the C levels of the 
other are not conceptualized as random samples. In agricultural experi- 
ments where К varieties of wheat and C varieties of fertilizer are used, the 
investigator is usually concerned with the yield of particular wheat varieties 
and with the effect of particular fertilizers on yield. He is not concerned with 
drawing inferences about hypothetical populations of wheat and ferti- 
lizer varieties. Both variables or factors are fixed. Any factor is fixed 
if the investigator on repeating the experiment would use the same levels 
of it. Under the fixed model R= R, and С=С, By substituting 
(Rp — R)/R, = 0 and (C, — C)/C, = 0 in the expectations of the mean 
squares for the finite model given in Table 16.3, the expectations for the 
fixed model are obtained. 

In psychological experiments different methods of learning, environmental 
conditions, methods of inducing stress, and the like, are examples of fixed 
factors or variables. In many experiments different levels of the experi- 
mental variable are introduced, e.g., levels of illumination, time intervals, 
size of brain lesion, and dosages of a drug. While the levels may be thought 
to constitute a representative set and interpolation between levels may be 
possible, such variables are usually regarded as fixed. Of course it is possible 
to conceptualize a study where, for example, levels of illumination or dosages 
of a drug are sampled at random from a population of levels or dosages. 
Ordinarily, however, experiments are not designed in this way. 

In many experiments one basis of classification is a random factor or 
variable and the other is fixed. Measurements may be obtained for a sample 
of К individuals for each of C treatments or experimental conditions. Here 


Sec. 16.7] Analysis of Variance: Two-way Classification 251 


one basis of classification is random and the other is fixed. This is a mixed 
model. In the mixed-model situation either Rp = Rand C, > C or К, > К 
and C, = C. By substituting (R, — R)/R = 1 and (C, — C)/C = 0, or 
vice versa, in the expectations for the finite model of Table 16.3, we obtain 
the expectations for the mixed model. 

Table 16.3 may be used to provide the required expectations for a two-way 
classification where m = 1. Under this circumstance no within-cells variance 
estimate s,* is available. The expectations for row, column, and interaction 
effects for the random, fixed, and mixed models are obtained by writing n = 1 
and substituting the appropriate values of (Rp — R)/R, and (C, — C)/C;. 


16.7. Choice-of-error Term 


By choice-of-error term is meant the selection of the appropriate variance 
estimate for the denominator of the F ratio in testing row, column, and 
interaction effects. In general, in forming an F ratio, the expectation of the 
variance estimate in the numerator should contain one term more than the 
expectation of the variance estimate in the denominator, the additional term 
involving the effect under test. On applying this principle to the expecta- 
tions of Table 16.3, the following rules may be formulated: 

1. Random model; n > 1. The proper error term for testing the interaction 
effect is 5,2. Ё; = sj/s,*. The correct error term for testing row and 
column effects is 52. Ё, = s/s? and Ё, = 52/52. 

2. Fixed model; п > 1. The proper error term is Sw? for interaction, row, 
and column effects. The three F ratios are Р; = 5;2/sw*, Е, = s,?/s,^, and 
Е =ise/ se. 

3. Mixed model; n > 1. The proper error term for testing the interaction 
effect is sy. Ё; = s/s;?. When R is random and C is fixed, the proper 
error term for testing row effects is s... Р, = 52/5. The proper term for 
testing column effects is s; Fe = 52/52. When Ris fixed and C is random, 
the converse procedure applies. F, = s,2/s,, and Fe = 32/5,“ 

4. Random model: n= 1. No sy? is available. The correct error 
term for testing both row and column effects is sê. F, = sit ss 
and Е, = s?/s?. 

5. Fixed model: n = 1. The point of view may be adopted that no test 
of either row or column effects can be made. This point of view requires 
some modification. The ratio s,?/s;! is an estimate of (сг? + Сол?) /(ae? + са?) 
and will, where о> > 0, be an underestimate of (c, + Co?)/o. This 
means that if a significant result is obtained, the investigator knows a fortiori 
that the effect tested is significant. If the result is not significant, the proba- 
bility of accepting the null hypothesis, Ho:c.* = 0, when it is false, may be 
high. Thus in the absence of significance no conclusions should be drawn 
from the data. 


252 Statistical Analysis in Psychology and Education [Снлр. 16 


6. Mixed model: n — 1. When R is random and C is fixed, the situa- 
tion pertaining to the testing of row effects is as described above for the 
fixed model, ж = 1. The proper error term for the column effect is 5. 
Fe = 52/52. When С is random and R is fixed, the argument relating to the 
fixed model, n = 1, again applies. The proper error term for the row effect 
iss? F, = 52/502, 

Тһе above rules, excluding the modifications of rules 5 and 6 above, can 
be very simply obtained by using the following schema for the proper choice- 
of-error term: 


Row: C—C sè + С Su? 
C, Р 
R,—R R 
Column: ? scs 
ии А 
Interaction: Sw 


For the random model, (C; — C)/C, = 1 and C/C, = 0. The proper error 
term for row and column effects is s. Similarly, the proper error term for 
the fixed and mixed models may be obtained. When л = 1, all terms 
containing s,* vanish. For the random model, s;? becomes the correct error 
term for row and column effects. For the fixed model, no tests are possible. 
When rows are random and the columns are fixed, the column effect may be 
tested, but not the row. 


16.8. Pooling Sums of Squares: n > 1 


Under certain circumstances the within-cells and interaction sums of 
Squares may be added together and divided by the combined degrees of 
freedom to obtain an estimate of variance based on a larger number of degrees 
of freedom. Caution should be exercised in applying this procedure. 

For the fixed model, the within-cells variance estimate is the proper error 
term for testing interaction, row, and column effects. For the random 
model, the interaction variance estimate is the proper error term for testing 
row and column effects, These procedures are always correct. For both 
models, when the interaction is quite clearly nof significant, the within-cells 
and interaction sums of squares may be pooled to obtain a variance estimate 
for the denominator of the F ratio based on a larger number of degrees of 
freedom. Of course, when row and column effects are clearly significant, 
when tested without pooling, the pooling procedure is unnecessary. 

When doubt exists as to the significance of the interaction, the investigator 
may or may not choose to pool the sums of squares. If the interaction effect 
in fact exists, с.а? being greater than zero, and terms are pooled, the pooling 
may be said to be erroneous. 

For the fixed model, erroneous pooling will increase the size of the error 


Src. 16.9) Analysis of Variance: Two-way Classification 253 


term. For the random model, erroneous pooling will decrease the size of 
the error term. In both instances the number of degrees of freedom is 
increased. Erroneous pooling will for the fixed model usually lead to too 
few significant results and for the random model to too many significant 
results. 

For the mixed model, when rows are random and columns are fixed, 
pooling may be applied with nonsignificant interaction. In this situation 
erroneous pooling will tend to make the error term too large for testing row 
effects and too small for testing column effects, leading to too few significant 
effects for rows and too many for columns. 

An understanding of the consequences of pooling sums of squares for 
fixed, mixed, and random models, when interaction does exist, that is, when 
са? > 0, may be obtained by examination of the expectations of the variance 
estimates given in Table 16.3. Quite clearly, for the fixed model, when 
са? > 0, combining interaction and within cells will lead to an error term 
whose expectation is greater than ae. Consequently, too few significant 
results will be obtained. 

In general, it is probably advisable not to pool unless the investigator is 
quite confident that the interaction is not significant. For a detailed discus- 
sion of this rather troublesome problem, see Binder (1955). 


16.9. Computation Formulas for Sums of Squares 


Computation formulas are used to calculate the required sums of squares. 
A simplified notation is used. Denote the sum of all observations in the 
rth row by 7;., the sum of all observations in the cth column by T.e, the sum 
of all observations in the cell corresponding to the rth row and cth column by 
Tre, and the sum of all N observations by T. 

With one entry in each cell, the computation formulas for sums of squares 


are as follows: 


R 
1 Т? 
Rows: С D T (16.3) 
r-1 
с 
1 а 
Columns: R р qui x (16.4) 
с=1 
е € R с 
1 2 4° Н, 
Interaction: T Ў, х»; 1 PA ج[‎ R У Te + N (16.5) 


С 
R C 
T: 
Total: ` у Ху (16.6) 


254 Slalislical Analysis in Psychology and Education [Cmar. 16 


The interaction sum of squares may be obtained by adding row and column 
sums and subtracting this from the total sum of squares. This provides no 
check on the accuracy of the calculation; consequently it is preferable to 
compute the interaction term directly. 

Computation formulas for sums of squares with x entries in each cell are 
as follows: 


R 
; pl дш, n 
Rows: aC 3 Ту N (16.7) 
r=1 
с 
І ub з Г? 
Columns: aR y T N (16.8) 
cml 
R c R с 
T JI "ur ST TUIS 
Interaction: Б » 2 ak 5С у T. aR Т. + WV (16.9) 
r=l c= rel с=1 
RC n R C 
Within cells: Е: ) Xna = 1 уу Tid (16.10) 
r=le=l $21 r=l cml 


Total: У » хылы? (16.11) 


r=l c=] iml 


Here again the interaction sums of squares may be obtained by subtracting 
the row, column, and within-cells sums of squares from the total, although 
direct calculation of the interaction term is preferable. 

The reader should note that the analysis of variance for two-way classifica- 
tion with a single entry in each cell is a particular case of the more general 
case with more than one entry in each cell, When n = 1, formulas for the 
latter case become the formulas for the former. 


16.10. Illustrative Example of Two-way Classification: n = 1 


Table 16.4 shows hypothetical data for two-way classification with one 
entry per cell. Rows are individuals, and columns are treatments. The 
data are presumed to relate to a random sample of individuals tested under 
different treatment conditions. This is a mixed model. One basis of classi- 
fication, the columns, is fixed. The other basis of classification, the rows, is 
random. 


Applying the appropriate computation formulas, the following sums of 
squares are obtained: 


Src. 16.10] Analysis of Variance: Two-way Classification 255 


Rows: 
R 
1 »..2* _ 394,350 _ (1,970? _ 
уте Paes Lu = 1,565.00 
т=1 
Columns: 
R 
1N 4, Tt. 1,085,756 _ (1970. 
i». NR 10 = 7,953.10 
с=1 
Interaction: 
С R c 
1 1 T 
E MA рать 2 
d e 15 2 +5 
т=1 с=1 r=1 с=1 
3 _ 394,350 _ 1,045,756 (1,970) _ 
= 122,984 4 10 ar тпай 16,843.40 
Total: 
ETE 
2 2 
> Хан 5 = 122,984 — кана 25,961.50 
т= 1 с=1 
TABLE 16.4 


DATA FOR THE ANALYSIS OF VARIANCE WiTH Two-way CLASSIFICATION: n = 1 
SCORES FOR A SAMPLE OF SUBJECTS TESTED UNDER Four DIFFERENT CONDITIONS 
А 


Conditions 
Subject 
A B [5 D T7) 5 
1 31 42 14 80 167 | 41.75 
2 42 26 25 106 199 | 49.75 
3 84 21 19 83 207 | 51.75 
4 26 60 36 69 191 | 47.75 
5 14 35 44 48 141 | 35.25 
6 16 80 28 76 200 | 50.00 
7 29 49 80 39 197 | 49.25 
8 3 38 76 84 230 | 57.50 
9 45 65 15 91 216 | 54.00 
10 30 71 82 | 39 | 222 | 55.50 
ا‎ -|= — | —— L 

Án 349 487 419 | 715 Т = 1,970 
Xx 34.90 48.70 41.90 | 71.50 £.. 49.25 


R с 
> Т, = 394,350 у Т.2 = 1,045,756 ) X, = 122,984 


1 ecl r-l1c-1 


256 Statistical Analysis in Psychology and Education [Снлр. 16 


Table 16.5 summarizes the analysis-of-variance data for this example. 
Because this is a mixed model with n = 1 and F, = s,?/s? = .279, no mean- 
ingful test of row effects is possible. The proper error term for column effects 


TaBLE 16.5 
ANALYSIS OF VARIANCE FOR DATA OF TABLE 16.4 


Source of 
variation 


Variance 
estimate 


173.89 = s,? 
2,517.70 = sè 
PM c 623.83 =з 


S; 


se r? 
F. = = = 4.04 F, = = =.279 
s? s? 


is 52. The F ratio for column effects is found to be 4.04. The F ratios 
required for significance with 3 and 27 degrees of freedom associated with 
the numerator and denominator, respectively, are 2.96 at the 5 per cent and 
4.60 at the 1 per cent levels. Thus the column differences are significant at 
the 5 per cent level but fall short of significance at the 1 per cent level. 


16.11. Illustrative Example of Two-way Classification: n > 1 


Table 16.6 shows data obtained in an animal experiment designed to study 
the effects of two variables on measures of performance of rats in a maze 


TABLE 16.6 
DATA FOR THE ANALYSIS OF VARIANCE WITH Two-way CLASSIFICATION: п > 1 
ERROR SCORES FOR THREE STRAINS OF RATS REARED UNDER 
Two ENVIRONMENTAL CONDITIONS 


v" 
Environ- ae _. Strain 
| 
Y» Bright | Mixed Dull 
| 
26 14 4 82 36 87 
41 16 26 86 39 99 
E" 28 29 19 45 59 126 
92 31 59 37 27 104 
51 35 39 114 42 133 
Restricted 96 36 | 104 92 92 124 


Sec. 16.11] Analysis of Variance: Two-way Classification 251 


test. Three strains of rats were used, bright, mixed, and dull. A group 
from each strain was reared under free and restricted environmental condi- 
tions. Thus there are six groups of experimental animals with eight animals 
in each group. The total N is 48. The data are arranged in a 2 X 3 table 
with eight observations in each of the six cells. The row means permit a 
comparison of environments, and the column means a comparison of strains. 
Table 16.7 shows the sums, means, and sum of squares of row, column, and 
cell totals. The sum of squares for all the observations is also given. 


TABLE 16.7 


COMPUTATION rog Data ОР TABLE 16.6 
Е 


Ѕігаіп 
Environment Total 
Bright Mixed Dull 
= Ти = 277 Ти = 395 Tu = 577 T, = 1249 
ы эл Ru = 34.63 | R= 49.38 | Xu = 72.13 X, = 52.04 
З Та = 441 Ta = 901 Ts, = 2094 
Restricted £n = 55.13 Хы = 112.63 | Za = 87.25 
T Ta = 718 Т.з = 1478 T = 3343 
ota Ў, = 44.88 Ў, = 92.38 | X..- 69.65 
Е С 
Т, = 5,944,837 y T. = 4,015,617 
т=1 с=1 


Е C 
» ja Tr? = 2,137,469 у y Xi? = 309,851 


т=1с=1 т=1с=11=1 
Applying the computation formulas, the calculations are аз follows: 


Rows: 


R 
1 T? _ 5,944,837 — (3,343)? — 
= D We y Cni ДР ҮТ 14,875.52 


Columns: 


N 16 


Within cells: 


r=l c=] i=l т=1с=1 


258 Stalistical Analysis in Psychology and Education [Cnar. 16 
Interaction: 


R C R с 
1 1 1 фы 
13 3, Тт dtc ee Tèt т 
т=1с=1 r-1 с=1 ` 
_ 2,137,469 5,944,837 4,015,617 , (3,343)? 
8 24 16 48 
= 1,332.04 
Total 


R C 
» Y Т? (3,343)? 
ГЕ - fuus ы! мА дн 
Хы ү = 309,851 48 77,024.98 


The analysis-of-variance table for these data is given in Table 16.8. The 
df for rows is R — 1 = 2— 1 = 1, for columns Ç — 1 = 3 — 1 = 2, for 


TABLE 16.8 
ANALYSIS OF VARIANCE FOR DATA OF TABLE 16.6 


Source of 
variation 


Variance 
estimate 


Rows (environments)... . . 
Columns (strains). . 


14,875.52 
18,150.04 


14,875.52 = s? 
9,075.02 =з 


Interaction........... «d 1,332.04 666.02 = s;? 
Within cells. ..,.... .... oak 42,667 .38 M 1,015.89 = sy? 
tM chia e eter ces fe ROMS 


афа е 55 гё 
Fem a = 66 COEM Р, = 893 


interaction (К — 1)(C — 1) = (2— 1)(3 — 1) = 2, and for within cells 
RC(n — 1) = 2X 3(8 — 1) = 42. These sum to the total sum of squares 
RCn — 1= 2X 3 X 8 — 1 = 47. For these data a fixed model is арргорп- 
ate and s," is the proper error term for testing row, column, and interaction 
effects. For interaction we have Fy = s//s,? = 666.02/ 1,015.89 = .656. 
This is less than unity. The expectation on the basis of the null hypothesis 
is unity. The interaction is somewhat less than we would ordinarily expect 
under the null hypothesis. We may safely conclude that there is no signif- 
icant interaction between the two experimental variables. For differences 
in environments we have F, = 5,2/s,2 = 14,875.52/1,015.89 = 14.64 with 
1 df associated with the numerator and 42 df with the denominator. For 
these df the values required for significance at the 5 and 1 per cent levels are 
4.07 and 7.27. We conclude that the different environments have affected 
the maze performance of the animals. For strains the required ratio is 
Р, = 51/s,! = 9,075.02/1,015.89 = 893 with 2 df associated with the 


Sec. 16.12] Analysis of Variance: Two-way Classification 259 


numerator and 42 df with the denominator. Again, this difference is signif- 
icant at well beyond the 1 per cent level, and the conclusion is that differences 
in strain affect maze performance. 


16.12. Unequal Numbers in the Subclasses 


Situations arise in educational and psychological research where the 
numbers of observations in the subclasses in a two-way analysis of variance 
are unequal. In animal experimentation in psychology, this situation may 
result from loss by death or accident of a number of animals during the 
conduct of the experiment. For the fixed model, if the cell frequencies do 
not depart significantly from either equality or proportionality, simple 
adjustments may be made to the data. Two methods will be briefly 
described: the method of expected equal frequencies and the method of 
expected proportionate frequencies. The treatment given here is based on 
the work of Fei Tsao (1946). 

In applying the method of expected equal frequencies the following steps are 
involved: 

1. Apply a x? criterion to determine whether the cell frequencies depart 
from equality. Denote the frequency in the cell corresponding to the rth 
row and cth column by ж. The expected equal frequency is the average 
value of л, or N/RC. Denote this by я. The required x? is 

c 
(n, — A)? 
ñ 
г=1с=1 
with RC — 1 degrees of freedom. 

2. If the cell frequencies do not depart significantly from equality at, say, 
the 1 per cent level, apply a simple adjustment to the sum and sum of squares 
for each cell by multiplying these values by Z/z,. Thus the adjusted cell 


sum is 
я 
аў. 
Mre / 


and the adjusted cell sum of squares is 


we Dy х 
Mro 4 


This adjustment estimates what the cell sum and sum of squares would be 
were there an equal number of cases # in each cell. Note that this adjust- 
ment does not change the cell means or the row and column means. 


260 Statistical Analysis in Psychology and Educalion [Cuar. 16 


3. Usethe adjusted cell sums and sums of squares to obtain row and column 
totals and the total sum of squares. 

4. Proceed with the analysis of variance in the usual way, employing the 
computation formulas given in 16.9. 

The method of expected equal frequencies is simple and may be usefully 
applied where the numbers of observations in the cells do not differ very 
much. 

In situations where the numbers of observations in the cells differ, but are 
roughly proportionate to the marginal totals, the method of expected propor- 
tionate frequencies is appropriate. This method requires the following steps: 

1. Apply a x? criterion to determine whether the cell frequencies in the 
rows and columns depart significantly from proportionality. Denote the 
observed frequency in the cell corresponding to the rth row and cth column 
by ñr and the marginal frequencies for rows and columns by #,, and t.e 
respectively. Denote the cell frequencies expected on the assumption of 
proportionality by Я. The expected frequencies are given by 


Ny Nee 
Se Sikes 


The procedure here is identical with that used in calculating expected cell 
frequencies for a contingency table given the restrictions of the marginal 
totals. The x? criterion is 


R С 
(эме T" fre)? 
Cg ÉL A 
x fi. 
r=l c=] 
with (R — 1)(C — 1) degrees of freedom. Р 


2. If the cell frequencies do not depart significantly from proportionality, 
the sum and sum of squares for each cell are adjusted by multiplying them 
by Are/ tre. The adjusted cell sum is then 


Sec. 16.12] Analysis of Variance: Two-way Classification 261 


This adjustment provides estimates of what the cell sums and sums of squares 
would be were the numbers in each cell proportional to the marginal totals. 

3. The required sums of squares for the analysis of variance are obtained, 
using the adjusted values, by applying the following formulas: 


R 
T,3 T> 
Rows: Sahak 
ows ) (=) N (16.12) 


с 
3 is Ja 
Columns: $ (7) x (16.13) 
R C nre в с 
Within cells: 7 y (E » x.) - » » (=) (16.14): 
Ure fy 
r=lc=l i=l т=1с=1 
R C R с 
ae io Lp. NN num 
Interaction: 2 д ( г) у (E) » (2) + N (16.15) 
т=1с=1 т=1 с=1 
B C Tire 
1 йс zi andi 
Total: "i 2 (= 5s Хы ) N (16.16) 
r-1c-1 i-1 


All Т?з relate to adjusted values. The above formulas differ from those 
previously given in 16.9 only in that they make allowance for the fact that 
the numbers of cases in the subclasses are unequal. 

4. Proceed with the analysis of variance in the usual way. 

In the above procedure the within-cells sum of squares is based on the 
adjusted values. Arguments may be advanced for using the unadjusted 
values in calculating the within-cells sum of squares. For comment on this 
point see Gourlay (1955). 

Both the methods of expected equal and expected proportionate frequencies 
are in some degree approximate. Departure from equal »'s in the former 
method and from proportionality of #’s in the latter method will introduce 
some bias in the F test, the extent of the bias being related to the magnitude 
of the departures. By bias here is meant that the F test produces either a 
larger or smaller proportion of significant P ratios than is warranted by the 
F distribution. 

The methods of equal and proportionate frequencies are applicable to a 
substantial proportion of situations encountered in practice. When the 
frequencies differ markedly from proportionality, other methods may be 
applied. For a discussion of these, see Snedecor (1956) and Kenney and 


Keeping (1954). 


262 Stalistical Analysis in Psychology and Education [Снлр. 16 


For the random model, bias is introduced in the F test despite the pro- 
portionality of the numbers in the subclasses. From a practical viewpoint 
this is not an important consideration. Good examples of the random model 
with unequal z's are difficult to find in educational and psychological research. 
Of more practical importance is the fact that for the mixed model F test bias 
is introduced when the cell frequencies are proportional, and experiments 
involving this model are not infrequent. The bias is positive, the F test 
producing a larger proportion of significant Ё ratios than the F distribution 
warrants. Fora discussion of this problem the reader is directed to Gourlay 
(1955). 

In general, because of the complications associated with unequal fre- 
quencies, it is advisable, whenever possible, to design experiments with an 
equal number of cases in the subclasses, although for the fixed model pro- 
portionate numbers of cases in the subclasses will introduce no bias. The 
investigator will thereby avoid a number of inconvenient complexities. 


16.13. Higher-order Classification 


This chapter has concerned itself with the analysis of variance for experi- 
ments with two bases of classification. Experiments may be designed with 
more than two bases of classification with either one or more than one 
observation per cell. A common design with three bases of classification 
occurs where observations are made on every individual in a sample under 
RC different treatment conditions. A consideration of higher-order classi- 
fication is beyond the scope of this book. For a discussion of this topic the 
reader is referred to Walker and Lev (1953) and to McNemar (1955). On 
choice of proper error term for higher-order classification an examination of 
Wilk and Kempthorne (1955) will prove helpful. 


EXERCISES 


1. In an experiment involving double classification with 10 observations in each cell, the 
following cell and marginal means were obtained: 


9.6 


9.9 


10.4 3.9 15.0 98 


Compute (a) the cell means expected under zero interaction and (b) the interaction sum 
of squares. 

2. The following are measurements made on a sample of 12 subjects under three experi- 
mental conditions: 


Cuar. 16] Analysis of Variance: Two-way Classification 263 


Condition 
Subject |—————— — ——— ————— 


Ci C: C: 
1 8 7 15 
2 19 14 20 
3 7 9 6 
4 23 20 18 
5 14 26 12 
6 6 14 15 
7 5 9 20 
8 22 25 20 
9 11 15 16 


Obtain the sums of squares and the variance estimates. Test the column means on the 
assumption that experimental condition is a fixed variable. 

3. The following are data for a double-classification experiment involving two fixed vari- 
ables: 


Apply the analysis of variance to test the significance of row, column, and interaction 


effects. 
4. The following are data with unequal numbers in the subclasses: 


Apply the analysis of variance to test row, column, and interaction effect on the assump- 
tion that the two experimental variables are fixed. 


CHAPTER 17 


SELECTED NONPARAMETRIC TESTS 


17.1. Introduction 


Many tests of significance involve assumptions about the nature of the 
distributions of the variables in the populations from which the samples are 
drawn. The ¢ test and the analysis of variance, for example, assume nor- 
mality of the parent distributions. In experimental work situations arise 
where either little is known about the population distributions or these 
distributions are known to depart appreciably from the normal form. In 
such situations nonparametric tests may be appropriately used. Non- 
parametric tests make few assumptions about the properties of the parent 
distributions. Assumptions about the parent distribution are involved in 
nonparametric tests, but these are usually fewer in number, weaker, and 
easier to satisfy in data situations. Nonparametric tests are frequently 
spoken of as distribulion-free tests. The implication is that they are free, 
or independent, of some characteristics of the population distributions. 

The reader will recall the distinction between nominal, ordinal, interval, 
and ratio variables. Nonparametric methods are appropriate for nominal 
and ordinal data; parametric methods for interval and ratio data. In 
practice, nonparametric methods are frequently used with data of this latter 
type. The data are reduced to a form such that a nominal, or ordinal, statisti- 
cal procedure may be applied to them. An important class of nonparametric 
tests employs only the sign properties of the data. All observations above 
à fixed value, such as the median, may be assigned a plus, and all below, а 
minus. The original variable is replaced by, or transformed to, another 
variable which takes the sign values plus or minus. Another class of non- 
parametric test employs the rank properties of the data. The original 
observations are replaced by the numbers 1, 2, 3, . . . , N. Subsequent 
statistical manipulation and inferences are based on ranks. 

Nonparametric statistics when applied to interval and ratio data use only 
part of the information available. It is intuitively obvious that if measure- 
ments are transformed to variables employing only signs or ranks, something 
is lost in the process. In data where the assumptions required for a para- 
metric test are satisfied and both parametric and nonparametric tests may be 

264 


Sec. 17.2] Selected Nonparametric Tests 265 


applied, the nonparametric tests have less power. The power of a statistical 
test is defined as the probability of rejecting the null hypothesis when that 
hypothesis is false. The power of a test depends in part on sample size. 
Two tests, A and B, may be compared by considering the relative sample 
size required to make them equally powerful. The relative efficiency of the 
two tests is given by 100(N./N,), where № is the number of observations 
required to make test B as powerful as test A with Na observations. If A 
is the most powerful test available, the quantity 100(N./N,) is called the 
power efficiency of a test. The power efficiency of many nonparametric 
tests is fairly high for small samples and decreases with sample size. Such 
comparisons can of course only be made for normal distributions where both 
a parametric and a nonparametric test may be applied. Since nonparametric 
tests are used where little is known about the parent distribution, the power 
of the test in most practical situations is unknown. 

For a comprehensive treatment of nonparametric tests the reader is 
referred to Siegel (1956) and to Tate and Clelland (1957). Both books 
contain useful tables. 


17.2. A Sign Test for Two Independent Samples 


This test is known as the median test. It compares the medians of two 
independent samples. The null hypothesis is that no difference exists 
between the medians of the populations from which the samples are drawn. 
The corresponding parametric test is a / test for comparing the means of 
independent samples. The median test is based on the idea that in two 
samples drawn from the same population the expectation is that as many 
observations in each sample will fall above as below the joint median. 

The data consist of two independent samples of №; and Nz observations. 
To apply the median test the median of the combined №; + N» observations 
is calculated. In each sample, observations above the joint median are 
assigned a + and those at or below ita —. The number of + and — signs 
for each sample is ascertained. A x? test is used to determine whether the 
observed frequencies of + and — signs depart significantly from expectation 
under the null hypothesis. 

The following are observations for two independent samples: 


Sample I 10 Г 10...10 322015. 174.17. 149, 20/5/1225 6595) м] 
Sample II 6 7 8 8.125 1641.19, 19... 22 


The median of the N, + N: observations is 16. Assigning a + to values 
above the median and a — to values at or below it, we obtain 


Sample I aor Fe ae ilte tc: 2 ipee 


d 
Sample IT Later Np vU a ange 


266 Slalislical Analysis in Psychology and Education [Снлр. 17 
These data may be tabulated in the form of a 2 X 2 table as follows: 


Sample I 


Sample II 


The value of x? for this table with Yates's correction for continuity is .51. 
The value of x* required for significance at the 5 per cent level is 3.84. 
Obviously, in this case we have no grounds for rejecting the null hypothesis 
that the samples came from populations with the same median. This is 
a two-tailed test. 


17.3. A Sign Test for Two Correlated Samples 


This test compares the medians of two correlated samples. The null 
hypothesis is that no difference exists between the medians of the popula- 
tions from which the samples are drawn. The data are comprised of a set 
of N paired observations. The test is based on the idea that under the null 
hypothesis the expectation is that half the differences between the paired 
observations will be positive and the other half negative. The symmetrical 
binomial (3 + 2)^ is used to obtain the probabilities required for a one-tailed 
or a two-tailed test. 

The following are paired observations, X and Y, for a sample of 10 indi- 
viduals together with the sign of the difference between X and Y: 


X 15412195 31 36, | 10. 1f 190-115 10 16 
Y 19 30 26 8 10 6 17 13 22 8 
Sign of X—Y En gno ROM Top ROUTE Н НЛ) Ца! з onn agros 


Under the null hypothesis the probability that X is greater than V is equal 
to the probability that Y is greater than X, which in turn is equal to}. The 
expected numbers of + and — signs are equal. In this example we have six 
plus signs, three minus signs, and one zero difference. The zero difference is 
discarded. From the binomial expansion (& + $)" we can ascertain the exact 
probability of obtaining six or more plus signs under the null hypothesis. 
This probability is .254. This is a one-tailed test. The probability of 
obtaining either six or more plus signs or six or more minus signs is .508. 
This is a two-tailed test. Clearly here we have no grounds for rejecting 
the null hypothesis. 

Where N is not too small, the normal approximation to the binomial ог" 
x? may be used, preferably with Yates’s correction. In this case the expected 
values are №/2. In the above example the observed values are 6 and 3, the 


Sec. 17.4] Selecled Nonparametric Tesls 261 


expected values are 4.5 and 4.5, the corrected observed values are 5.5 and 
3.5, and х? = .44. The probability of obtaining a x* equal to or greater 
than .44 under the null hypothesis is .507. Although М is small, this is in 
close agreement with the exact probability of .508 obtained from the binomial. 
The reader will recall that x? provides the probability for a two-tailed test. 


17.4. A Sign Test for k Independent Samples 


This is an obvious extension of the median test for two independent 
samples. The data are comprised of ё samples of smi, #2, . . . , ny observa- 
tions. As before, the null hypothesis is that no difference exists in the 
medians of the populations from which the samples are drawn. The median 
of the combined m + na + ° - ° + m observations is calculated. For each 
sample, observations above the joint median are assigned a + and those 
either at or below the joint median а —. The data are arranged in a 2 X k 
contingency table, and a x? test applied. 

The following are data for four samples 


Sample I Hee i) ae iG. CAR ДАШ 
Sample II iau dao i eroe ДМД. с^ dp HH qu 
Sample III 48. (AB) 4.25, - 26; 9129081 

Sample IV 14/216 049. 12222 3022. 238 2185 


The total number of observations is 30. The median is 18. Assigning a + 
to values above the median and a — to values at or below, we obtain 


Sample I - = - = = = EISE 
Sample II == mim a E = EPA M 
Sample III - - Р eos xum + 

Sample IV - Ap. es Е: п Ж ы 


These data may be arranged in а 2 X 4 table as follows: 


Sample I 
Sample П 
Sample Ш 


Sample IV 


The value of x? calculated on this table is 7.56. The number of degrees of 
freedom is (4 — 1)(2 — 1) = 3. The value of x? required for significance 
at the 5 per cent level is 7.82. The observed value falls just below this. 


268 Slalislical Analysis in Psychology and Educalion [Снлр. 17 


17.5. А Rank Test for Two Independent Samples 


Given two independent samples of №; and Ne observations, the combined 
Ni + № observations may be arranged in order. А rank 1 may be assigned 
to the smallest value, a rank 2 to the next smallest, and so оп. The sums of 
ranks for the two samples may be obtained. Denote these by R, and Rs 
for the sample of №, and Ne cases, respectively. Assuming the samples to be 
drawn from the same population, what are the expected sums of ranks? 
The expected value of R, is N, times the mean of the №; + N» ranks and is 


E(R)) = BOE Na +1) (17.1) 


Similarly, the expected value of Re is 


Е(К») = (п +з +1) (17.2) 


We calculate the deviation of R; or А, from the value expected on the assump- 
tion that the samples are drawn from the same population. The absolute 
deviations of R, and Rs from expectation are equal. Consequently we need 
only calculate either №, or Rs. 

When both N, and JN; are equal to or greater than 8, the sampling dis- 
tribution of the deviations of R;, or Rs, from expectation may be regarded 
as approximately normal, with a mean of zero and a standard deviation of 
MV NN + N:+ 0/12. The normal deviate z is then given by 


= LLLA — E(R) _ 2Rı МУ +1) 
2 +N: + 1) | +1) (17.3) 
12 3 


If this value is equal to or greater than 1.96 or 2.58, we reject the null 
hypothesis at either the .05 or .01 level and accept the alternative hypothesis 
that the samples are from different populations. 

Consider the following observations: 


Sample I 27 33 31.52 53 8 69 оо 17 
Sample II cux» mS 201 44 4d 4 d$ S € A 


Assigning ranks, proceeding from the smallest to the largest values, we obtain 


Sample I FS OP OMe Is) 1g v4 49 49 22 
Sample II b dedecus wb. Ipod 15 o 3H 


The sum of ranks R, for sample I is 142, and for sample II the sum Rg is 111. 
The expected values of R, and А, under the null hypothesis are, respectively, 
115 and 138. R, is 27 points above and Rs 27 points below expectation. 


Sec. 17.6] Selected Nonparametric Tests 269 


The normal deviate is, then, 
ы 27 м 
10 X 12(10 + 12 + 1) 
12 


Since this falls below 1.96, we have no grounds for rejecting the null 
hypothesis for a two-tailed test. The result is, however, significant at the 
5 per cent level for a one-tailed test. 

When ties occur, the tied observations may be assigned the average of 
the ranks they would occupy if no ties had occurred. If ties are fairly 
numerous, a correction may be applied to the standard deviation in the 
denominator of the z ratio. Corrected for ties, that ratio becomes 


Ri — E(Ri) 


mers -x) 


where N = N, + Ne and T = (f — 0/12, where t is the number of values 
tied at a particular rank. The summation of T extends over all groups of 
ties. 

The above procedure is appropriate for samples greater than 8. For 
samples less than 8, exact probabilities may be obtained from tables based on 
the exact sampling distributions. These tables require the calculation of a 
statistic U, the test being known as the Mann-Whitney U test. We calculate 


1.78 


Uy = NN, + DO t Dg, (17.5) 
шз = Мыз + Mt) p, (17.6) 


These two values differ. U is taken as the smaller of the two. Tables have 
been prepared by Mann and Whitney (1947) showing the probabilities 
associated with different values of U for N, and N, up to 8. Extended tables 
have been prepared by Auble (1953) for Ni and Nz up to 20. These tables 
are reproduced in Siegel (1956). 


17.6. A Rank Test for Two Correlated Samples 


The rank test described here for two correlated samples is due to Wilcoxon 
and is sometimes called the Wilcoxon matched-pairs signed-ranks test. The 
data are a set of N paired observations. The difference d between each pair 
is calculated. If the two observations in a pair are the same, then d = 0 
and the pair is deleted from the analysis. Values of d may be either positive 
or negative. The d’s are then ranked without regard to sign. A rank 1 is 
assigned to the smallest d, 2 to the next smallest, and so on. If two or 
more d’s are tied, the usual practice is adopted of assigning to the tied ranks 


210 Statistical Analysis in Psychology and Educalion [Снар. 17 


the average of the ranks they would have been assigned if they had differed. 
The sign of the difference d is attached to each rank. If d is positive, the 
rank is positive; if d is negative, the rank is negative. Under the null 
hypothesis the sum of the positive ranks will tend to equal the sum of the 
negative ranks. If a marked difference between the sums is observed, this 
constitutes evidence for the rejection of the hypothesis that the two sets of 
measurements are from the same population. The smaller of the two sums 
of ranks is denoted by the letter T. Table I of the Appendix provides values 
of T required at various significance levels for both a one-tailed and a two- 
tailed test for № up to 25. 4 

The following are paired observations, X and Y, for a sample of 10 
individuals: 


X 15 19 31 36 10 11 19 15 10 16 
Y 19 30 26 8 10 6 17 13 22 8 
d -4 —11 $5 28 0 5 2 2 —12 8 
Rank -$ -1 4.5 9 4.5 1.5 1.5 —8 6 


Values of d have been calculated. One pair of observations is tied and is 
deleted from subsequent consideration. The d's are rank-ordered by abso- 
lute magnitude. The lowest values are a pair of 2's. These are assigned 
rank values of 1.5. The sum of negative ranks is 18. The sum of positive 
ranks is 27. Thus T, the smaller of the two sums, is 18, In this example 
N = 9, a pair of observations having been deleted. Table I of the Appendix 
shows that for N = 9 a value of T equal to or less than 6 is required for 
significance at the 5 per cent level for a two-tailed test. These data do not 
warrant rejection of the null hypothesis. 
For large samples, T has an approximate normal distribution with 


Mean = vw) (17.7) 


NN + DQN + 1) 


T 8 
24 (17.8) 


апа Standard deviation = А 


The normal deviate z is given by 
т NN +1) 


4 
s= SSS = =———— 17.9) 
Sore Dans ү 
24 
Values of 1.96 and 2.58 are, as usual, required for significance at the 5 per 
cent and 1 per cent levels for a two-tailed test. 


17.7. A Rank Test for k Independent Samples 


A rank test for # independent samples is the Kruskal-Wallis (1952) one-way 
analysis of variance by ranks. The null hypothesis is that the & independent 


Sec. 17.7] Selected Nonparametric Tests 271 


samples are from the same population. To apply this test all the observa- 

tions for the Ё samples are ranked. The lowest value is assigned a rank of 1, 

the next lowest 2, and so оп. The sum of ranks R; for each of the k samples 

is obtained. A statistic Н is calculated from the data. This is defined by 
k 

12 (= 


Е= Ww +1) RA — 3(N +1) (17.10) 
i=l 


Mi 


where n; = number of observations in sample ? 

N = total number of observations 

R; = sum of ranks for sample i 
For samples of reasonable size this statistic has a chi-square distribution with 
k — 1 degrees of freedom and may be referred to any table of x’. In this 
context reasonable size may be interpreted to mean more than five cases in 
the groups. For Ё = 3 and s; < 5, tables of exact probabilities have been 
prepared by Kruskal and Wallis. 

When ties occur, the usual convention is adopted of assigning to the tied 
observations the average of the ranks they would otherwise occupy. The 
value of Н is then divided by 

йч ZT 

NUN 
where T = # — 1, and / is the number of tied observations in a group. 
The quantity Н corrected for ties is 


k 
12 Re 
NW FA (2) - ser» 


= (17.11) 
с: [ЁТ 
N'—N 
The correction for ties will increase the value of H. 
The following are data for three samples: 

Sample I 3 7 11 16 22 29 31 36 

Sample II 3 4 7 18 19 32 

Sample IIT 22 38 46 47 47 50 53 54 56 


In this example ж = 8, m2 = 6, m = 9, and N = 8 + 6 + 9 = 23. All 23 
observations are ranked to obtain 


Sample I 1.5 4.5 6 7 10.5 12:018 "19 
Sample II 1.5 $; "48 8 9 14 
Sample III 10.5 16 17 18.5 18.5 20 21 22 23 


272 Statistical Analysis in Psychology and Education — [Cup 17 


The sums of ranks are calculated. These are R, = 69.5, Ry = 40, and 
R; = 166.5. We note that we have four sets of ties of two observations 
each. T — 2* — 2 — 6, and for the four sets ET — 24. The value of H 
is then 


12 ‘Gs 40? к) 
rt T — 3(23 + 1) 
н 2323+ 0X 8 6 9 = 13.88 
117.124 
23° — 23 


In this example the effect of the correction for ties is negligible and may for 
all practical purposes be ignored. On reference to a table of x? with df = 2, 
we note that an H of 13.88 is significant at better than the 1 per cent level. 
We may then reject the hypothesis that the samples are from the same 
population. 


17.8. A Rank Test for k Correlated Samples 


A rank test for & correlated samples is the Friedman two-way analysis of 
variance by ranks (1937). The data are a set of k observations for a sample 
of N individuals. Such data arise in many experiments where subjects 
are tested under a number of different experimental conditions. The 
corresponding parametric test is an analysis of variance for two-way classi- 
fication where observations are made on each of a group of individuals under 
more than two conditions. If there is reason to believe that the assumptions 
underlying the analysis of variance are not satisfied by the data, the Friedman 
rank method is appropriate. 

The data are arranged in a table containing N rows and Ё columns. The 
rows correspond to individuals, or groups, and the columns to experimental 
conditions. Table 17.1 shows such an arrangement of data for eight subjects 

TABLE 17.1 


MATERIAL RECALLED APTER Four TIME INTERVALS FOR A GROUP OF 
Елбит SUBJECTS 


Time interval 
Subject | — — — — 

uo me v ш IV 
1 4| 5 9 3 
2 ЕЕЕ 
3 7 13 | 14 6 
4 16 12 14 10 
5 2 4 7 6 
6 1 4 5 3 
LEREN S- 6 7 9 
8 5 7 8 9 


Sec. 17.8] Selected Nonparametric Tests 273 


tested under four experimental conditions. The observations in the rows are 
ordered as shown in Table 17.2. For example, the four observations in the 


TABLE 17.2 
RANKS ASSIGNED By Rows rog THE DATA oF TABLE 17.1 


Time interval 
Subject 

I II ш IV 

1 2 3 4 1 
2 2 3 4 1 
3 2 3 4 1 
4 4 2 3 1 
5 1 2 4 3 
6 1 3 4 2 
7 1 2 3 4 
8 1 2 3 4 
К; 14 20 29 17 


top row аге 4, 5, 9, and 3. These are replaced by the ranks 2, 3, 4, and 1. 
The ranks in each column are summed. If the samples are from the same 
population, the ranks in each column will be a random arrangement of the 
numbers 1, 2, 3, and 4. Under these circumstances the sums of ranks for 
columns will tend to be the same. If these sums differ significantly, the 
hypothesis that they are from the same population may be rejected. 
The test to be applied to the column sums of ranks is a chi-square test. 
We calculate the quantity 
k 
- 12 Ж = 
"I NEEF 1), É 3N(k + 1) (17.12) 
{= 
where Ё = number of conditions 
N = number of individuals 
К; = rank sum for column i 
xr? has an approximate chi-square distribution with k — 1 degrees of freedom. 
For the data of Table 17.2 we have 


12 2 2 2 ’) = ЕЕ, 

x? = gx 4a 3 1) 04 + 20° + 29? + 17?) — 3 X 8)4 + 1) = 945 
This result for df = 4 — 1 = 3 falls between the 5 and 1 per cent levels of 
significance. Actually it is a little above the 2 per cent level. If this level 
of confidence is acceptable, we may conclude that the samples are not drawn 
from the same population and that a difference in the experimental conditions 
is exerting an effect. 


274 Statistical Analysis in Psychology and Education [Снхлр. 17 


Exact probabilities are available for k = 3, N = 2 to 9 and for k = 4, 
N = 2 to 4. These tables are given by Friedman (1937) and Siegel (1956). 

Where ties occur the tied observations may be assigned the average of 
the rank they would otherwise occupy. 


EXERCISES 
1. The following are data for two groups of experimental animals: 
Group I 104 109 127 143 186 204 209 266 277 
Group II 62 82 89 90 101 106 109 109 205 


Apply a sign test to test the hypothesis that the two samples come from populations with 
the same median. 

2. The following are data for a sample of nine animals tested under control and experi- 
mental conditions; 


Control 21 24 26 32 55 82 46 55 88 
Experimental 18 9 .23$ 26 82 19 42 30 62 


Test the significance of the difference between the two medians using a sign test. 
3. Apply a sign test to the data of Exercise 2, Chap. 15. 
4. Apply the Mann-Whitney U test to the data of Exercise 1 above. 
5. Apply the Wilcoxon matched-pairs signed-ranks test to the data of Exercise 2 above. 
6. Apply the Kruskal-Wallis one-way analysis of variance by ranks to the data of Exercise 
2, Chap. 15. 
7. Apply the Friedman two-way analysis of variance by ranks to the data of Table 16.4. 


cHAPTER 18 


ERRORS OF MEASUREMENT 


18.1. The Nature of Error 


The measurements obtained in the conduct of experiments are subject 
to error of greater or less degree. In measuring the activity of a rat, the 
intelligence of a child, or the response latency of an experimental subject, 
we may assume that the individual measurements are subject to some error. 
In general, the concept of error always implies a true, fixed, standard, or 
parametric value which we wish to estimate and from which an observed 
measurement may differ by some amount. The difference between a true 
value and an observed value is an error. If we represent a particular 
observation by X;, the true value which it purports to estimate by Т, and 


an error by е, we may write 
a= Xi = Ti (18.1) 


where e; may take either positive or negative values. 

A distinction may be made between systematic and random error. Obser- 
vations which consistently overestimate or underestimate the true value are 
subject to systematic error. А stop watch which underestimates time 
intervals will yield observations with systematic errors. A random error 
exhibits no systematic tendency to be either positive or negative and is 
assumed to average to zero over a large number of subjects or trials. Random 
errors are also assumed to be uncorrelated both with true scores and with 
each other. The discussion in this chapter is concerned exclusively with 
random errors. 

Any definition of error as the difference between an observed and true 
value is meaningless unless a precise definition is attached to the concept of 
true value. In theory a true value is sometimes conceptualized as the mean 
of an indefinitely large number of measurements of an attribute made under 
conditions such that the true value remains constant, and the procedures 
used in making the measurements do not change from trial to trial in any 
known systematic fashion. In mathematical language the true value may 


be defined as 


276 Slalislical Analysis in Psychology and Educalion [Cuav. 18 


where X; refers to the jth measurement. Thus the true value is the limit 
approached by the arithmetic mean as the number of repeated observations 
К is increased indefinitely. This concept of true value is appropriate for 
the measurement of physical quantities. For example, a yardstick may be 
used to measure the length of a desk. The measurement procedure may be 
repeated many times, and the variation in the observations attributed to 
error. It may be assumed that a considerable number of repeated obser- 
vations may be made under fairly constant conditions, neither the desk nor 
the yardstick changing in any systematic way. By increasing the number of 
observations and taking their mean, the error in estimating the true value 
may be reduced. Theoretically, this error may be made as small as we like 
by increasing the number of observations. As the number of observations 
becomes indefinitely large, the mean approaches the true value as a limit. 

Questions may be raised about the appropriateness of this concept of true 
value in the measurement of psychological quantities. Clearly, in the 
measurement of human behavior the making of a large number of repeated 
observations is usually not possible. The attribute being measured may 
fluctuate or change markedly with time, or the process of repeated measure- 
ment may modify the attribute under study. For example, in measuring 
the intelligence of a child, it is obviously out of the question to administer 
the same intelligence test 100 times to obtain an estimate of error. Quite 
apart from the labor involved in such estimation, the results obtained would 
be invalidated by practice, fatigue, and other effects, This circumstance 
has given rise in psychological work to a variety of procedures for estimating 
error other than by a series of repeated measurements. Despite the opera- 
tional impracticality in psychology of estimating error by making a large 
number of repeated measurements, the concept of true score as the mean of 
an indefinitely large number of such measurements is still a necessary and 
important concept in the study of errors of measurement. Here we note that 
the role of true score is analogous to that of population parameter in sampling 
statistics. The difference between the sample statistic and the population 
parameter is a sampling error. By increasing sample size the magnitude of 
sampling error is reduced. For an infinite population an unbiased sample 
statistic will approach the population parameter as a limit as the sample 
becomes indefinitely large. A sampling error is an error associated with a 
statistic based on a sample of observations. An error of measurement is 
usually construed to be an error associated with a particular observation 
which is an estimate of a true value, In most instances both population 
parameters and true values cannot be known but can only be estimated from 
fallible data. This circumstance does not detract from the meaningfulness 
of, and necessity for, these concepts, nor does it prevent the making of 
meaningful statements about the magnitude of error. A concept of true 
value, however defined, is a logical necessity for any theory of error. 


Sec. 18.2] Errors of Measurement 277 


18.2. Effect of Measurement Error on the Mean and Variance 


Consider a population of measurements. Each measurement is subject 
to error and may be written as 


Xi = Tic ei 


where X; is the observed and 7; the true measurement. By summation over 
all members of the population we obtain 


ZXi те ХТ; + Хе; 


If we assume that measurement error is random, and as often positive as 
negative, we may write Хе; = 0. Consequently, the sum of measurements 
subject to error is equal to the sum of true measurements. It follows also 
that the means of the observed and true values are equal, both being equal 
to the population mean ш. We conclude that measurement error exerts по 
systematic effect on the arithmetic mean. А mean based on a sample of 
N measurements will exhibit no tendency to be either greater than or less 
than the mean of true measurements. The expectations of the mean of 
observed and true scores are equal to the population mean и; that is, 


E(X) = ET) = u (18.2) 


Measurement error exerts an effect on the sampling variance of the arith- 
metic mean. This point is discussed in Sec. 18.7. 

Measurement error exerts а systematic effect on the variance. We may 
write 


(Xi — à) = (Ti — и) + ei 


If we square this identity, sum over all members of the population, and divide 
by Np, where N, is the number of members in the population, we obtain 


X(X;— u)? | X(Ti— и)? , Ze? , 2X(T; — ше 
- ыч 7 + + N ^ 
N Р N Р Np р 
On the assumption that measurement errors are random and uncorrelated 
with true scores, the third term to the right is equal to zero, and we may write 


of = от? + с (18.3) 


Thus the variance of observed scores is equal to the variance of true scores 
plus the variance of the errors of measurement. For a fixed or*, the more 
inaccurate the measurements the greater the value of ог? and the greater 
the variance oz". 


278 Statistical Analysis in Psychology and Education [Снлр. 18 


18.3. The Reliability Coefficient ' 


Consider a situation where each member of a population has been measured 
on two separate occasions. Two observations are available for each member. 
Both are presumed to be measures of the same attribute, and both are subject 
to error. We may write 

Xa = Tit ea 
Ха = Ti + ба 


In deviation form these become 


(Xa л) = (T; — д) + ea 
(Ха — и) = (Ti — и) + еа 
By multiplying these two equations, summing over a population of №, 
members, and dividing by N, түт», we obtain 
a (Xu — В) (Xe — u) 
хх 
Мутуоз 
SN Z(T; — u)* + Beneo + Zea(T; — ») "ti Zes(Ti =. и) 
Nos 


On the assumption that errors are random and uncorrelated with each other 
and with true scores, the three terms in the right in the numerator are equal 
tozero. Because the paired observations are measures of the same attribute, 
оу = оь Also X(T; — u)? = Nyr*. Hence, writing o = о; = оу, 
2 
g 
Pss = =; (18.4) 
Or 
where pzs is the reliability coefficient. The reliability coefficient is a simple 
proportion. It is the proportion of obtained. variance that is true variance. 
Ifo," = 400 and or? = 360, the reliability coefficient p,, = .90. This means 
that 90 per cent of the variation in the measurements is attributable to 
variation in true score, the remaining 10 per cent being attributable to error. 
Where sample estimates are used we may write 
зт? 


mtx: (18.5) 


r. 
z: PE 


where rss is the sample estimate of the reliability coefficient. 


18.4. Methods for Determining Reliability 


Above, the reliability coefficient has been discussed without reference to 
methods for obtaining such coefficients in practice. A number of different 


Sec. 18.4] Errors of Measurement 219 


practical methods for determining reliability are used. These methods are 
as follows: i 

1. Test-retest method. The same measuring instrument is applied on two 
occasions to the same sample of individuals. When the instrument is a 
psychological test, the test is administered twice to a sample of individuals 
and the scores correlated. 

2. Parallel-forms method. Parallel or equivalent forms of a test may be 
administered to the same group of subjects, and the paired observations 
correlated. Criteria of parallelism are required. 

3. Split-half method. This method is appropriate where the testing 
procedure may in some fashion be divided into two halves and two scores 
obtained. These may be correlated. With psychological tests a common 
procedure is to obtain scores on the odd and even items. 

4. Internal-consistency methods. These are used with psychological tests 
comprised of a series of items, usually dichotomously scored, a 1 being 
assigned for a pass and a 0 for a failure. These methods require a knowledge 
of certain test-item statistics. 

The interpretation of a reliability coefficient depends on the method used 
to obtain it. When the same test is administered twice to the same group 
with a time interval separating the two administrations, some variation, 
fluctuation, or change in the ability or function measured may occur. The 
departure of r, from unity may be construed to result in part from error 
and in part from changes in the ability or function measured, With many 
psychological tests the value of rz will show a systematic decrease with 
increase in the time interval separating the two administrations. When 
the time interval is short, memory effects may operate. The subject may 
recall many of his previous responses and proceed to reproduce them. A 
spuriously high correspondence between measurements obtained at the two 
testings may thereby result. Regardless of the time interval separating the 
two testings, varying environmental conditions such as noise, temperature, 
and other factors may affect the result obtained. Likewise, varying physio- 
logical factors, fatigue and the like, may exert an influence. 

In estimating reliability by the administration of parallel or equivalent 
forms of a test, criteria of parallelism are required. Test content, type of 
item, instructions for administering, and the like, should be similar for the 
different forms. Also the parallel forms should have approximately equal 
means and standard deviations. In addition, the intercorrelations should 
be equal. Thus with three parallel tests the intercorrelations should be such 
that ria = rı = fa. A discussion of criteria for parallel tests is given by 
Gulliksen (1950). Situations arise where a large pool or population of test 
items is available. Samples of items may be drawn at random. Each 
sample of items is a randomly parallel form. This approach to the develop- 
ment of parallel tests has been studied at length by Lord (1955a, 1955b). 


280 Statistical Analysis in Psychology and Educalion [Снлр. 18 


In many situations a single administration only of a test may be possible. 
The test is divided into two halves. A not uncommon procedure is to divide 
a test into odd and even items. Scores are obtained on the two halves, and 
these are correlated. The result is a reliability coefficient for a half test. 
Given a reliability coefficient for a half test, the reliability coefficient for a 
whole test may be estimated using the Spearman-Brown formula. This 
formula is 


"e 2rnn 
р ICE 


(18.6) 


where r is the reliability of a half test. If, for example, rx, = .80, then 
fa» = .89. The Spearman-Brown formula provides an estimate of the 
reliability of the whole test. It estimates what the reliability would be if 
each test half were made twice as long. 

The split-half method should not be used with highly speeded test material. 
Obviously, if a test is comprised of easy items, and a subject is required to 
complete as many items as possible within a limited time interval, and all 
or nearly all items are correct, the scores on the two halves would be about 
the same and the correlation would be close to +1.00. 

A method of obtaining reliability coefficients using test-item statistics has 
been developed by Kuder and Richardson (1937). Many psychological 
tests are constructed of dichotomously scored items. An individual either 
passes or fails the item. A 1 is assigned for a pass, and 0 fora failure. The 
score is the number of items done correctly. The proportion of individuals 
passing item i is denoted by the symbol pi, and the proportion failing, by 
qi, where q; = 1 — pi. An estimate of reliability is given by 


М وق‎ = Ф: 


Fens а З Ny (18.7) 


شی 1 —» 


where n = number of test items 
5,% = variance of scores on test 
biqi = product of proportion of passes and fails for item 7 
n 


Фи: = sum of these products for » items 
i=l 
This formula is frequently referred to as Kuder-Richardson formula 20. 
The coefficient r++ computed by this formula will take values ranging from 
zero to unity, If the responses of individuals to the test items are assigned 


n 
at random, the expectation of s,? is equal to Z pig: and the expectation of 


Skc. 18.1] Errors of Measurement 281 


rzzis zero. If all items are perfectly correlated, a situation which can only 
arise when all have the same difficulty, rss = 1. The correlation between 
items is the phi coefficient. 

If all assumptions implicit in the split-half method of estimating reliability 
coefficients are satisfied, the split-half and Kuder-Richardson formula 20 
will yield identical results (Ferguson, 1951). Because these assumptions are 
rarely, if ever, satisfied in practice, differences in the coefficients obtained 
will result. One difficulty with the split-half method is that a test may be 
split in a great many ways, yielding many different values of л. It may be 
shown that if a test is split in all possible ways, the average of all the split-half 
reliability coefficients with the Spearman-Brown correction is the Kuder- 
Richardson formula 20. This coefficient has a simple unique value for any 
particular test. 

The Kuder-Richardson formula 20 is a measure of the internal consistency, 
or homogeneity, or scalability, of the test material. In this context these 
three terms may be considered synonomous. If the items оп а test have high 
intercorrelations with each other and are measures of much the same attri- 
bute, then the reliability coefficient will be high. If the intercorrelations 
are low, either because the items measure different attributes or because of 
the presence of error, then the reliability coefficient will be low. 

The Kuder-Richardson formula 20 may be applied to tests comprised of 
items which elicit more than two categories of response. Personality and 
interest inventories and attitude scales frequently permit three or more 
response categories. For a dichotomously scored item we note that piq; is 


n n 
the item variance s? and я biqi = D 52, the sum of the item variances. 
e ici 
For an item with more than two response categories, where each category 
has been assigned a weight, the individual item variances may be calculated 
and their sum may be substituted in Kuder-Richardson formula 20 for 


n 
2 pig. Consider a test comprised of statements which elicit the possible 
i-i 
responses “agree,” “undecided,” “disagree.” Let pi, ps, and ps be the 
proportion of individuals responding in the three categories. If weights 
3, 2, 1 or +1, 0, —1, or any other system of weights, are assigned to the 
categories, the item variance may be calculated. These may be summed, 
n 
and the sum substituted for > Pi. The quantity 5.2 is, of course, the 
ic 
variance of scores obtained by summing items with the assigned weights. 
For further discussion see Ferguson (1951). CUTS 
On the assumption that all test items are of equal difficulty, a simplified 
form of the Kuder-Richardson formula may be obtained for use with dichoto- 


282 Slalislical Analysis in Psychology and Education [Снлр. 18 


mously scored test items. This formula may be written as 


дыш = 1 (1 Л Же сз 2) M89 


where X is the mean test score and s,* is the variance. This formula is 
referred to as Kuder-Richardson formula 21. The formula may be derived 
using the assumptions implicit in the concept of randomly parallel tests 
(Sec. 18.10). 


18.5. Estimating Reliability Coefficients 


In determining reliability coefficients by the test-retest, parallel-forms, 
or split-half methods, product-moment correlations are usually calculated 
between the paired observations. Jackson and Ferguson (1941) have shown 
that modified procedures for estimating pzs are preferable. In the esti- 
mation of population parameters the method of estimation used depends 
on the conditions which are assumed to exist in the population from 
which the samples are drawn. In estimating pz. three situations may be 
recognized. 

Case 1. Neither the two standard deviations nor the two means are 
assumed to be equal. 

Case2. The two standard deviations, but not the two means, are assumed 
to be equal. 

Case3. Both the two standard deviations and the two means are assumed 
to be equal. 

For Case 1 the usual formula for the product-moment correlation соећ- 
cient is appropriate. In Cases 2and 3, the usual product-moment formula 
does not yield the maximum likelihood estimate of p,, The method of 

aximum likelihood, developed by R. A. Fisher and preferred by many 
statisticians, is a method of estimation which maximizes the probability of 
the observed event. 

For Case 2 Jackson and Ferguson (1941) have shown that where the 
assumption is made that о = v» = e, the maximum likelihood estimates 
б and д of e and pz, are 


mE |р Tata, eR id D ха — ex | (18.9) 
2 ГУ ХаХа = хызлы) 


bez = -5 (18.10) 
| x Emy " » Xa сла 


Sec. 18.6] Errors of Measurement 283 


For Case 3, where both ду = we = р and e; = ог = c are assumed, the 
maximum likelihood estimates of с and pz, are 


, 1 - j АХИ) 
б = У 2, xe - ®Ха+ ха (18.11) 


2 E ne Xa t Эш 
În = | : (18.12) 


у m ji 2Xn + ХХ)? 
Xa? Pam eA لھ‎ 
An + Хә 2N 


Jackson and Ferguson suggest that the Case 2 formulas should be used 
in estimating reliability coefficients by the test-retest ог parallel-forms 
method and the Case 3 formulas for the split-half method. It seems appro- 
priate to use the Case 3 formulas in all situations where the means and 
variances are not significantly different one from another. Where the cri- 
teria of parallelism are satisfied, the Case 3 formulas are clearly appropriate. 


18.6. Effect of Test Length on the Reliability Coefficient 


In discussing split-half reliability, a formula was given for estimating the 
reliability of a whole test from the reliability of a half test. This formula 
is a particular case of a more general Spearman-Brown formula for esti- 
mating increased reliability with increased test length. The more general 
formula is 


Tkk = me (18.13) 
where r+, = an estimate of reliability of a test of unit length 
ть = reliability of test made Ё times as long 
If r,, = .60 and the test is made four times as long, the reliability coefficient 
тъ for the lengthened test is estimated as .86. From a theoretical point of - 
view a test may be made as reliable as we like by increasing its length. 
Practical considerations, of course, restrict test length. 

Because reliability is a function of test length, reliability coefficients calcu- 
lated on tests of different lengths are, for certain purposes, not directly 
comparable. If, for example, we wish to compare the reliability of different 
types of test material, we presumably should require measures which were 
independent of the differing lengths of the tests. One procedure here 
is to use the Spearman-Brown formula and calculate reliability coeflicients 
for a standard test of 100 items. Ifa test has 40 items, then a value of 
k = 199 = 2.50 would be used in estimating the reliability of the standard 
test. If another test has 150 items, then k = 129 = .67, and so оп. Thus 
a comparison of the reliabilities of different tests may be made which is 
independent of differing test lengths. 


284 Slalislical Analysis in Psychology and Educalion [Cuar. 18 


18.7. Effect of Measurement Error on the Sampling Variance 
of the Mean 


Because measurement error affects the variance of a set of measurements 
it will also affect the sampling variance of the mean. The sampling variance 
of the arithmetic mean may be written as 

2 2 з 

сі? =F =F tS (18.14) 
The component or*/N is the sampling variance of the means of samples of 
true measurements, and ¢,*/N is the component of the sampling variance 
attributable to measurement error. While measurement error exerts no sys- 
tematic effect on the sample mean as an estimate of д, such error increases 
the variation in sample means with repeated sampling. The increase in 
sampling variance over that with no measurement error present is o,2/N. 

The ratio of the sampling variance of the mean of true scores to the 
sampling variance of the mean of obtained scores is the reliability coefficient. 
Thus 

aft uL PAN S w (18.15) 


Чар сг? АГ ог/ oz 


This means that the reliability coefficient may be interpreted as descriptive of 
the loss in efficiency of estimation resulting from measurement error. To 
illustrate, a mean calculated on a sample of 100 cases, where pz, = .80, has a 
sampling variance equal to that of a mean calculated on a sample of 80 cases 
where pzz = 1.00. The loss in efficiency of estimation resulting from meas- 
urement error amounts to 20 cases in 100. 


18.8. Effect of Errors of Measurement on the Correlation 
Coefficient 


Errors of measurement tend to reduce the size of the correlation coefficient. 
The correlation between true scores will tend to be greater than the correla- 
tion between obtained scores. If pz, is the correlation between X and Y 
in the population, the relation between the correlation of true and obtained 
scores is given by 

eee Pa - 6 
prar, TE (18.16) 
where pr,r, = correlation between true scores 
pz. = reliability of X 
Pyy = reliability of У 
This formula is known as the correction for attenuation. Errors tend to 
attenuate the correlation coefficient between obtained scores from the corre- 


1 


Sec. 18.9] Errors of Measurement 285 


lation between true scores. For a derivation of this formula and a discussion 
of the simplifying assumptions involved, see Walker and Lev (1953). The 
corresponding sample form of the correction for attenuation is 


зу 
ттт, = ——— (18.17) 
Tv Ea 


To illustrate, let r4, = .60, rz. .80, and r,, = .90. The correlation 
between true scores on X and Y, estimated by the above formula, is .707. 
The correlation may be viewed as attenuated from .707 to .60 because of 
errors of measurement. The squares of these coefficients yield a better 
appreciation of the loss in predictive capacity due to errors of measurement. 
The squares of .707 and .60 are .50 and .36. We conclude that the presence 
of errors of measurement results in 14 per cent loss in predictive capacity. 
If the correlation between two variables is low, the correlation will not be 
markedly increased by improvements in reliability. If the correlation is 
high, improving reliability may result in substantial gains in the prediction 
of one variable from another. 

Because the correlation between true scores can never exceed unity, the 
maximum correlation between two variables arises where rr,7, = 1. Under 
this circumstance rz, = Муту: This is an estimate of the maximum 
correlation between X and Y. If rz, = .80 and ry, = .90, the maximum 
possible correlation between X and Y is estimated as \/.80 X .90 = .85. 


18.9. Reliability of Difference Scores 


Situations arise where the difference between two sets of measurements 
is defined as a score. The two measurements may be initial, or prestimulus 
values, and values obtained in the presence of a stimulus factor. If differ- 
ences are obtained between standard scores on X and У, that is, between 
в and zy, the reliability of the differences may be estimated by 


_ Yas + Ty — 27у : 
TER EET (18.18) 


where rz, and fyy = reliability coefficients for X and V 
raa = reliability of difference zz — Zy 
For fixed values of rz, and r,, the reliability of the difference will decrease 
with increase in ray from zero. If rz = .90 and ry, = .80, for rzy = .80 the 
reliability of differences raa = .25. For fz, = 0, ra = .85. As rz, departs 
in a positive direction from zero, the error variance accounts for an increasing 
proportion of the total variance of differences, with a resulting decrease in 
reliability. The point to note here is that difference scores may be grossly 
unreliable and should be used only after careful scrutiny of the data. When 


286 Statistical Analysis in Psychology and Educalion [Crar. 18 
the correlation between the two variables is reasonably high, it is probable 
that with many sets of data most of the variance of differences is error 
variance. 


18.10. The Standard Error of Measurement 


Because pzz = от?/о:? and v! = от? + o, we may write 


ра = 1 — 55 (18.19) 
Or 
and Oe = 0, V1 = pez (18.20) 


This latter formula is the standard error of measurement. Where s, and 
fz» are used as estimates of az and pzz, we obtain 


$e m aT fun (18.21) 


as the corresponding sample estimate. If it may be assumed that errors of 
measurement are independent of the magnitude of test score, then s, may 
be used as the standard error associated with a single score and interpreted 
in the same way as the standard error of any statistic. On the assumption 
of a normal-curve approximation, the 95 and 99 per cent confidence intervals 
of an individual's score X; are estimated by X; + 1.96s, and X; + 2.585, 
respectively. With most psychological tests, however, errors of measure- 
ment are not independent of the magnitude of test score. The standard 
error is higher in the middle-score range and diminishes in size as the score 
departs from the average. Because of this the use of s, to estimate con- 
fidence intervals for particular scores may yield misleading results. The 
variance s¿ is a sort of average value, and s, when applied to particular 
scores has meaning only in relation to scores near the average. 

The problem of the standard error of measurement associated with psy- 
chological test scores has been investigated by Lord (1955a, 1955b, 1957). 
Lord defines the standard error of measurement as the standard deviation 
of scores an individual might be expected to obtain on a large number of 
randomly parallel test forms. The assumption is that the ability of the 
individual remains unchanged and is not affected by practice, fatigue, and 
thelike. Randomly parallel forms are viewed as composed of items drawn 
at random from a large pool or population of items. The items are scored 1 
for a pass and 0 for a failure, a score on a test being the sum of item scores. 
The proportion of items in the population which individual i can do correctly 
is 0. The true score of individual i for a test of » items is T; = n0;. The 
number of items done correctly by individual i for a random sample of 
n items is X;. The standard deviation of the sampling distribution of the 
Х 25 is the standard error. This is obtained from the standard deviation 


Sec. 18.10] Errors of Measurement 287 


of the binomial and is given by 


aX) = Vn&(1— 6) 


=. P Tin — Td (18.22) 


An individual's score X; may be used as an estimate of 7;. Introducing 
the factor п/(п — 1) to obtain an unbiased estimate yields 


XQ (ee En (18.23) 


This formula may be used for estimating the standard error of a test score 
X; Where n = 100 and X; = 50, s((X;) = 5.02. Where X; = 80 and 
n = 100, s.(X;) = 4.02. The standard error diminishes in size as the more 
extreme values are approached. 

Because o,(X;) depends on Ту, the 95 per cent confidence interval for a 
score X; cannot be estimated by simply obtaining X; + 1.96s,(X;). Тһе 
standard error of the upper limit will in general differ from the standard error 
of the lower limit, and this circumstance must enter into the procedure used 
for determining the interval. Denote Xy and X; as the upper and lower 
confidence limits. These limits may be calculated by solving for Xy and 


Xz іп 
Xy 2 Xi3 1964 Р Xela 2709) 
boy КД КЛА |В ЖӨ КУУ 


Consider a score 80 for а 100-item test. The upper limit Xy = 86.7, and 
the lower limit Xz = 71.1. The standard error of the upper limit is 3.4, 
and of the lower limit 4.5. The obtained score of 80 is 1.96 X 3.4 below 
86.7 and 1.96 X 4,5 above 71.1. Consider a situation where an individual 
obtains a score of 100 on a 100-item test. The upper limit is 100. The 
lower 95 per cent limit obtained by solving for X; is 96.38. The standard 
error of measurement for this individual is estimated to range from 0 to 1.87. 

Lord (1955a) has shown that if sẹ is taken as the average of з, (Ху) and 
substituted in ry, = 1 — s,2/s2, unbiased variance estimates being used 
throughout, Kuder-Richardson formula 21, described in Sec. 18.4, is obtained. 

In most practical situations where parallel tests are used, the tests are 
not randomly parallel in the strictest sense. The items are matched to 
some extent. The standard error for such tests will be less than that esti- 
mated by 5,(Х). Thus s,(X;) in most situations will tend to be a moderate 
overestimate, It is of interest to note that s,(X,) is independent of the 


Ш 


288 Statistical Analysis in Psychology and Education [Снар. 18 


characteristics of the items of which a test is comprised, provided, of course, 
that these are scored 1 for a pass and 0 for a failure. 


18.11. Concluding Observations 


The theory and method associated with the study of measurement error 
in psychology has been developed in relation to psychological testing. Much 
of this theory and method is generally applicable to measurements of all 
kinds. Little attention has been directed to the study of measurement error 
by experimental psychologists. It is probable that in much work in the 
field of human and animal learning, fairly gross error attaches to many of 
the measurements made. Reliability coefficients less than .50 are not 
uncommon, and coefficients of zero are perhaps not isolated curiosities. 
The errors which attach to measurements in the field of animal experimenta- 
tion are known quite often to be substantial. Low reliability does not 
necessarily invalidate a technique as a device for drawing valid inferences. 
Low reliability may be compensated for by increase in sample size. An 
unreliable technique used with a small sample is, however, capable of detect- 
ing gross differences only, and the probability of not rejecting the null 
hypothesis when it is false may be high. When significant results are 
reported with an unreliable technique on a small sample, the treatment 
applied is usually exerting a gross effect. 

A common type of experimental design requires the making of measure- 
ments on an experimental group in the presence of a treatment and on a 
control group in the absence of the treatment. Although substantive evi- 
dence is lacking, it is probable that in many experiments the measurements 
are less reliable under the experimental than under the control conditions, 
one of the effects of the treatment being to increase measurement error. 
It seems probable that this effect is more likely to occur when the treatment 
is in the nature of a gross assault on the normal functioning of the organism, 
as is the case with certain drugs, stress agents, and operative procedures. 
Experimental situations may be found where the treatment may increase 
rather than decrease the reliability of the measurements. This author can 
recall one experiment where the important effect of the treatment was to 
stabilize and make more reliable the responses of the experimental animals. 

The discussion of measurement error given in this chapter is of necessity 
brief and incomplete. The most comprehensive discussion available on 
measurement error as applied to psychological tests is found in Gulliksen 
(1950). A brief but straightforward treatment of measurement error is 
given by Guilford (1954). For a consideration of the analysis of variance 
as applied to test reliability and other specialized topics, including the 
Kuder-Richardson formulas, the reader is referred to the monograph by 
Jackson and Ferguson (1941). On the standard error of a test score the 


Sec. 18.11] Errors of Measurement 289 


work of Lord (1955a, 1955b, 1957) is important. For an analysis of the 
interpretation of reliability coefficients calculated by different methods, the 
reader should consult Cronbach (1947, 1951). 


EXERCISES 


1. For rex = .90 and s, = 15, estimate the variance of true scores and the error variance. 
What percentage of the obtained variance is due to error? 

2. The following are correlations between half tests: „30, .50, .72, .80, .96. Find reliability 
coefficients for the whole tests. 

3. The following are difficulty values р; for a test of 20 items: 


(1) .97 (6) .53 (11) .50 (16) .04 
(2) .95 (7) .75 (12) .55 (17) .35 
(3) .76 (8) .40 (13) .42 (18) .27 
(4) .80 (9) .82 (14) .30 (19) .15 
(5) .60 (10) .20 (15) .15 (20) .09 


"The standard deviation of test scores is 6.5. Calculate reliability coefficients using both 
Kuder-Richardson formulas 20 and 21. Explain the difference between the two 
coefficients. 
4, Fora particular test rer = .50. What is the effect on the reliability coefficient of making 
the test five times as long? 
5. The sampling variance of an arithmetic mean of a test is 6.2 where rz: = .80. What 
part of the sampling variance is due to sampling error, and what part to measurement 
error? If the test were made three times as long, what proportion of the sampling 
variance of the mean of the lengthened test would be due to measurement error? 
Estimate the correlation between true scores on X and Y where fry = .60, rz; = .80, 
and ry, = .90. What is the maximum possible correlation between X and Y? 
1. For the data of Exercise 6 above, calculate the reliability of difference scores in standard 


6. 


score form between X and Y. 
8. Estimate the standard error associated with the individual scores 7, 26, and 44 for a test 


of 50 items. 


СНАРТЕЕ 19 


PARTIAL AND MULTIPLE CORRELATION 


19.1. Introduction 


Previous discussion of correlation has been concerned with the relation- 
ship between two variables. In many investigations data on more than 
two variables are gathered and forms of multivariate analysis are required. 
Two forms of correlational analysis which may be applied to multivariate 
data are partial and multiple correlation. Partial correlation deals with the 
residual relationship between two variables where the common influence 
of one or more other variables has been removed. Multiple correlation 
deals with the calculation of weights which produce the maximum possible 
correlation between a criterion variable and the weighted sum of two or more 
predictor variables. Its purpose is to maximize the efficiency of prediction. 
Other forms of multivariate analysis exist, but these are beyond the scope 
of the present elementary discussion. 


19.2. Partial Correlation 


Let us assume that a test of intelligence and a test of psychomotor ability 
have been administered to a group of children showing considerable variation 
in age. Both intelligence and psychomotor ability increase with age. Теп- 
year-old children are on the average more intelligent than six-year-old 
children, They also have more highly developed psychomotor abilities. 
Scores on the two tests will correlate with each other because both are 
correlated with age. Partial correlation may be used with such data to 
obtain a measure of correlation with the effect of age eliminated or removed. 

What is meant by eliminating, or removing, the effect of a third variable? 
These terms in the present context have a precise statistical meaning. Let 
X;, Xs, and X; be three variables. All or part of the correlation between 
X; and X; may result because both are correlated with X; The reader 
will recall from previous discussion on correlation that a score on X; may 
be divided into two parts. One part is a score predicted from X;. The 
other part is the residual, or error of estimate, in predicting X, from X;. 
These two parts are independent, or uncorrelated, Similarly, a score on 
X» may be divided into two parts, a part predictable from X; and a residual, 


Sec. 19.2] Partial and Multiple Correlation 291 


or error of estimate, in predicting Хз from Хз. The correlation between the 
two sets of residuals, or errors of estimate, in predicting X; from X; and X; 
from X; is the partial correlation coefficient. It is the part of the correla- 
tion which remains when the effect of the third variable is eliminated, or 
removed. 

The formula for calculating the partial correlation coefficient to eliminate 
a third variable is 

2 a Tig — 713793 " 

12.3 эЛ Йир) = БИЙ тй my (19.1) 

The notation rj; means the correlation between residuals when X; has been 

removed from both X; and Xs. This is sometimes called a first-order partial 
correlation coefficient. 

Let X, and Xs be scores on an intelligence and a psychomotor test for а 
group of school children. Let X; beage. Let the correlation between the 
three variables be as follows: r = .55, 713 = .60, and rs = .50. The 
partial correlation coefficient is 


.55 — .60 X .50 
112.3 = = = .36 
м (1 — .60°)(1 — .50°) 


Using a variance interpretation, the proportion overlap between X; and 
Xa is ry? = .55? = .303. The proportion overlap with X; eliminated 
is 7155? = .362 = .127. The proportion overlap which results from the 
effects of age is .303 — .127 — .176. It would also be appropriate to state 
that the percentage of the total association present resulting from the effect 
of age is (.176/.303)100 — 58 per cent. The remaining 42 per cent of the 
association results from other factors. 

Partial correlation may be used to remove the effect of more than one 
variable. The partial correlation between X, and X; with the effects of 
both X; and X4 removed is 


Ti24 — T1547 23.4 (19.2) 


712.34 = = "id fe) 


This is a second-order partial correlation coefficient. Because of difficulties 
of interpretation, partial correlation coefficients involving the elimination 
of more than one variable are infrequently calculated. 

A £ test may be used to test whether a partial correlation coefficient is 
significantly different from zero. The required ¢ is 


ы " 712.3 (19.3) 


м(1 – ry2.3°)/(N — 3) 


This may be referred to a table of ¢ with N — 3 degrees of freedom. 


292 Slalislical Analysis in Psychology and Education [Снлр. 19 


19.3. Multiple Regression and Correlation 


The correlation coefficient may be used to predict or estimate a score on 
an unknown variable from knowledge of a score on a known variable. The 
regression equation in standard-score form is 


, 
2, = 1222 


where 2; is a predicted or estimated standard score. In this situation we 
have one dependent and one independent variable. If, = 1.2 and js = .80, 
the best estimate of an individual's standard score on variable 1 is 


sj 80 X 1.2 = .96. 


The estimate is that the individual is .96 standard deviation units above 
the average. 

We may consider a situation where we have one dependent and two inde- 
pendent variables, The dependent variable may be a measure of scholastic 
success. The independent variables may be two psychological tests used 
at university entrance. The dependent variable is spoken of as the criterion. 
The two independent variables are predictors. How may scores on the two 
predictors be combined to predict scholastic success? The correlation 
between the three variables may be arranged in a small table. Let these 
correlations be as follows: 


Variable 1 is the criterion, and variables 2 and 3 are the predictors. Note 
that 1.0's have been entered along the main diagonal. In estimating stand- 
ard scores on 1 from standard scores on 2 and 3 separately, the two regression 
equations are z| = ‚8з, and 21 = 35. Variable 2 is a much better predictor 
than variable 3. Presumably, by employing a knowledge of both 2 and 3, 
a better estimate of the criterion may be obtained. 

Consider the straight sum of standard scores on 2 and 3. If the sums 
of the values in the four quadrants of the correlation table are represented by 


the correlation between a standard score on 1 with the sum of standard 


Sec. 19.3] Partial and Mulliple Correlation 293 


scores on 2 and 3 is given by 


TN 
МАВ 


In our example this becomes 


8+ 3 ee pe 635 


fatet Sip Pike SVS 


If we express variables 2 and 3 in standard measure, add them together, and 
correlate the sum with standard scores on the criterion, the correlation will 
be .635. This is not as good as the prediction obtained with variable 2 
taken alone. The straight sum of standard scores assigns equal weight to 
the two variables. When variables are added together directly, they are 
weighted in a manner proportional to their standard deviations. The 
standard deviation of standard scores is 1. Consequently, on adding 
together standard scores, the variables are equally weighted. 

Let us select some arbitrary set of weights and observe the result. Let us 
assign weights of 4 and 1 to the two predictors. Thus one predictor will 
receive four times the weight of the other. Write these weights along the 
top and to the side of the correlation table as follows: 


The correlation of the criterion with the sum 422 + 22 is again given by 
C/V AB and is: 
324.3 
"E o = 65 
Tree? 7 716.0 + 1.0 + 2.0 + 2.0 


This particular arrangement of weights, 4 and 1, results in a correlation 
which is substantially better than that obtained with equal weights. Obvi- 
ously, these are not the best possible weights. The correlation of the 
weighted sum with the criterion is less than that obtained with variable 2 


taken separately. 


294 Stalistical Analysis in Psychology and Educalion [Снлр.19 


How may a set of weights be obtained which will maximize the correlation 
between the criterion and the sum of scores on the dependent variables? 
Let us represent weights by the symbols 8з and 8з. An estimated standard 
score on 1 is then given by 


24 = Bote + Bats 


We wish to calculate weights 8s and 8; such that the correlation between 
zı and £ is a maximum. Mathematically, the problem reduces to the calcu- 
lation of weights which will minimize the average sum of squares of differ- 
ences between the criterion score z; and the estimated criterion score z We 
require values of 82 and 8; such that 


E У (zı — 21)? = a minimum 


The values of 8 and 8; are multiple regression weights for standard scores. 
They are sometimes called beta coefficients. 
With three variables the values of 8, and 8; are given by 


b: = re (19.4) 
By = DL m M (19.5) 
In the above example 
B = - 3X 5 5 867 
= TEX ے‎ 133 


Let us write these weights above and to the side of the correlation table 
and multiply the rows and columns as follows: 


694 |  .7532 —.058 
— 1040 | —.058 018 


The correlation between the criterion and the weighted sum is 


C/V AB = .654/4/ 65% = w/.654 = .809, 


Sec. 19.4] Parlial and Mulliple Correlation 295 


This is a multiple correlation coefficient and may be denoted: by К. No 
other system of weights will yield a higher correlation between the criterion 
and the weighted sum of predictors. 

Note that the sum of elements in the top right quadrant of the weighted 
correlation table is equal to the sum in the lower right, or C — B. This 
circumstance will occur if the weights used are multiple regression weights. 
It provides a check on the calculation. We note also that R? = C and 
R= VC. Thus the multiple correlation coefficient may be obtained by 
the formula 


К = У Bai + Boris (19.6) 


This is the commonly used formula for calculating a multiple correlation 
coefficient. 

In our example the multiple correlation is .809. The correlation of 
variable 2 with the criterion is .8. The addition of the third variable 
increases prediction very slightly. Ina practical situation the third variable 
could safely be discarded as contributing a negligible amount to the efficacy 
of prediction. 


19.4. The Regression Equation for Raw Scores 


The equation 21 = 822 + 8323 is a regression equation in standard-score 
form. It will yield the best possible linear prediction of a standard score 
on 1 from standard scores on 2 and 3. In practice, we usually require a 
regression equation for predicting a raw score on 1 from a raw score on 
2 and 3. Let X! be a predicted raw score on 1, and X» and X; the obtained 
raw score on 2 and 3. The estimated standard score zı and the observed 
standard scores z and z; may be written as 


j= Xo 
Xs == Xs 
a INL 
X; ж Xx, 
DEIN re, 


By substituting these values in the regression equation in standard-score 
form we obtain 


СИИР. 0‏ ری ی £ ا 


Sy 52 5з 


Rearranging terms and writing the expression explicit for X yields 


Xf = Bolt Xe + six (ж = вл, uim) (19.7) 


296 Statistical Analysis in Psychology and Educalion [Снлр. 19 


This is a regression equation in raw-score form. It may be used to predict 
a raw score on 1 from a raw score оп 2 апа 3. The values 8351/52 and Вз51/ 53 
act as weights. The quantity to the right in parentheses is a constant. 

In the example of the previous section 8» = .867 and 85 = —.133. Let 
us assume that sı = 5, s = 10, s; = 20; also X, = 20, X, = 40, and 
Xa = 60. The regression equation in raw-score form is written as 


Xi = (867) fX» + (—.133) 5X. + [20 — (.867)45(40) — (— .133)45(60)] 
= 434X, — .033X; + 4.62 


19.5. The Geometry of Multiple Regression 


Given two variables X; and X», each pair of observations may be plotted 
as a point on a plane. If interest resides in predicting one variable from 
a knowledge of another, a straight regression line may be fitted to the points 
and this line used for prediction purposes. 

Y 


PL 


[^] X 


Fic. 19.1. Geometrical representation of multiple regression. ABCD is a multiple 
regression plane, 


Given three variables Ху, Xs, and Хз, each triplet of observations may 
be plotted as a point in a space of three dimensions as shown in Fig. 19.1. 
Instead of two axes at right angles to each other, we now have three. All 
triplets of observations may be plotted as points. If the correlations between 
the three variables are all positive, the assembly of points will show some 
tendency to cluster along the diagonal of the space of three dimensions 
extending from the origin О to У. A plane may be fitted to the assembly of 


Sec. 19.6] Partial and Mulliple Correlation 297 


points. With two variables a regression line is fitted to points in a two- 
dimensional space. With three variables a regression plane is fitted to 
points in a three-dimensional space. In Fig. 19.1 this plane is represented 
by ABCD. With two variables the regression equation is the equation for 
a straight line and is of the type Xj = b2X2+ a, where б» is the slope of the 
line and a is the point where the line intercepts the X; axis. With three 
variables the regression equation is the equation for a plane and is of the 
type X, = bX» + bX» + a. Here 0» is the slope of the line AD in Fig. 
19.1 and b; is the slope of the line AB. The constant a is the point where 
the plane intercepts the X; axis. In Fig. 19.1 it is the distance AO. 

Consider now a particular individual. Represent his score on X by ОЁ 
and on X; by OF. We locate the point С in the plane of Хз and X; and 
proceed upward until we reach the point Н in the regression plane ABCD. 
The distance GH is the best estimate of the individuals score on X; given 
his scores on Xs and Хз. It is the best estimate in the sense that the regres- 
sion plane is so located as to minimize the sums of squares of deviations 
from it parallel to the X; axis. 

The reader will observe that the three-variable case is a simple extension 
of the two-variable case. A plane is used instead of a straight line. With 
four or more variables the idea is essentially the same. With four variables, 
in effect, we plot points in a space of four dimensions and fit a three-dimen- 
sional hyperplane to these points. By increasing the number of variables 
we may complicate the arithmetic. We do not complicate the idea. 


19.6. More than Three Variables 


In the discussion above we have considered the multiple regression case 
with three variables only, one criterion, and two predictors. With k variables 
the multiple regression equation in standard-score form is 


zi = Baza + Bots + ++ ° + Виа (19.8) 


The raw-score form of this equation may be obtained, as previously, by 
substituting for the values of z; the values (X; — X;)/s; and rearranging 
terms. We thereby obtain 


Х; = с Хз + |o RAS XE А (19.9) 


Sk 
where A is given by 


A= Ri — Bet a 5: ЕУ ГӨ (19.10) 
$2 $3 5 


k 


The multiple correlation coefficient is given by 
К = Ут + Виз + ccc + Bern (19.11) 


298 Statistical Analysis in Psychology and Education [Снлр. 19 


Thus to calculate this coefficient we multiply each correlation of a predictor 
with the criterion by its corresponding regression coefficient, sum these 
products, and take the square root. 

A number of computational procedures exist for calculating the required 
regression weights with more than three variables. А widely used method 
is the Doolittle method. The method described here originates with Aitken 
(1937) and has been called the method of pivotal condensation. It is 

. described in detail in Thomson (1950). 


19.7. Aitken's Numerical Solution 


To illustrate the application of Aitken's method let us consider а problem 
with five variables, one criterion, and four predictors. Denote the criterion 
by X; and the predictors by X», Xs, X4 and Xs. The criterion may be 
regarded as a measure of success in an occupation, and the predictors may 
be psychological tests used to predict performance in the occupation. 

The intercorrelations between the five variables are shown in Table 19.1. 
The means and standard deviations of the five variables are shown in Table 
19.2. Table 19.3 shows the procedure for calculating the multiple regression 
weights. This procedure requires the successive calculation of differences 
between cross products. If the four cell values are 


ab 

cd 

the difference between cross products is 
ad — cb 


In this case the cell value a is the pivotal element. 
The steps in the calculation are as follows: 

1. Write down the matrix of intercorrelations between the predictors, 
that is, between variables Xs, Xn X4, and Xs. Insert 1’s along the diagonal. 
Beneath this matrix write a row containing the correlations of the predictors 
with the criterion. The resulting matrix is shown to the left of slab A in 
"Table 19.3, 

2. To the right of the above matrix record another matrix with — 1's down 
the diagonal. All other elements are zero, including those in the bottom 
row. In Table 19.3 a dot represents a zero. 

3. Sum the rows to obtain the values in the check column. 

4. Calculate the differences between cross products for the first two rows 
of slab А, using the 1 in the top left cell as the pivotal element. Thus the 


Sec. 19.7] Partial and Multiple Correlation 299 


TABLE 19.1 
CORRELATION COEFFICIENTS BETWEEN A CRITERION AND Four PREDICTORS 
Xi ET Ay РЕ 


TABLE 19.2 
» MEANS AND STANDARD DEVIATIONS FOR CRITERION AND Four PREDICTORS 


TABLE 19.3 
AITKEN’S METHOD FOR COMPUTING REGRESSION COEFFICIENTS" 


í : .27 
.72 .58 41 ‚63 Р » 
‚6900 —1 
1.000 .080 —.151 1.317 —1.908 
042 .760 ‚079 .490 
B —.079  .079 .848 .390 
.083 .057 .349 .720 


мз —.16 .112 -1 
D 7589 —.194 . .136 —1.21 | 272. 
ài УЕ C ea n 1.158 
E .390 _ .222  .018  .431 | 1.061 


Regression coefficients 


* Example from Godfrey Н. Thomson, The factorial analysis of human ability, Sth ed., 
University of London Press, Ltd., London, 1951. 


300 Slalislical Analysis in Psychology and Education [Снлр. 19 
following product differences are formed: 


1X 1— .69 X .69 = .524 
1X .38 — 49 X .69 = .042 
1 X 19 — 39 X .69 = .079 
1X 0— (—1) X .69 = .690 
1X (—1) —0 69 = —1 
1X0—0x.69—0 

1X0-0X.69=0 


These values are recorded in the first row of slab B. The check value is 
obtained by forming the product difference 


1X 1.26 — 1.57 X .69 = 77 


If the calculation is correct to this point, the sum of elements in the first row 
of slab B will equal the product difference .177. 

5. Beneath the first row of slab B write a second version of it obtained by 
dividing each element by the top left element, .524. The result is a row 
with unity as the pivot. This assists subsequent calculation. "This part 
of the procedure is most readily accomplished by multiplying the elements 
in the row by the reciprocal of .524, or by 1.908. 

6. The remaining elements in slab B are obtained by forming product 
differences using the first row of slab А with the third, fourth, and fifth 
rows of slab А, successively, always using the 1 in the top left cell as the 
pivotal element. Thus 


1 X .38 — .69 X .49 = ‚042 
1X 1 — 49X 49 = .760 


and so on. Each row is summed to provide a check on the calculation. 
The result is a reduction of the original 5 X 4 matrix of slab A to the 4 X 3 
matrix of slab В. 

7. The procedure is now repeated to obtain slabs C, D, and E. At each 
stage, with the exception of the last, the top row in each slab is divided by 
the left-hand cell value, or multiplied by the reciprocal of that value, to 
obtain a second version of the top row. The appropriate reciprocal for row 
C is 1.321, and for row D it is 1,211. 

8. By proceeding with the calculation, the original matrix is condensed 
to the cell values in slab E. These four values are the multiple regression 
coefficients for predicting a standard score on the criterion from standard 
scores on the four predictors. 

In this example the regression equation for predicting the criterion from 
the predictors in standard-score form is 


si = 390z, -+ .222% + 018r, + 4312, 


бес. 19.8] Parlial and Mulliple Correlation 301 


No other system of weights will provide a better estimate of the criterion. 
The correlations of the four predictors with the criterion are 


5342.55.98 AL. ‚40д 


By multiplying these by the corresponding regression coefficients, summing 
the resulting products, and taking the square root, we obtain the multiple 
correlation coefficient as follows: 


К = /.390 X .72 + .222 X .58 + .018 X .41 + .431 X .63 = .83 


A multiple correlation coefficient is amenable to the same general type of 
interpretation as any other correlation coefficient. It is the correlation 
between a criterion variable and the weighted sum of the predictors, the 
predictors being weighted in order to maximize that correlation. 

To obtain a multiple regression equation in raw-score form we require 
the means and standard deviations of Table 19.2. We may write 


„кф. 5.68 5.68 5.68 „ 
X; = (.390) 1571 Xs + (.222) 9.92 X; + (.018) 6.32 X, 
5.68 


+ (.431) 1400 ^* +A 


The constant A is given by 


A = 872 — (.390) 26 (104.65) — (.222) 568 (43.22) 
— (018) 2 (14.98) — (431) 568 (87.22) = —26.81 


With any substantial number of variables the calculation of multiple 
regression weights is clearly a laborious procedure and requires the use of 
modern computing devices. 


19.8. The Significance of a Multiple Correlation Coefficient 


An F ratio may be used to test whether an observed multiple correlation 
coefficient is significantly different from zero. The required value of F is 
given by the formula 


nmn (19.12) 


where Ё = multiple correlation coefficient 

М = number of observations 

k = number of independent variables or predictors 
The table of F is entered with df; = k and dfs = N — k — 1. 


302 Statistical Analysis in Psychology and Educalion Cmar. 19] 


19.9. Some Observations on Multiple Correlation 


The techniques of multiple correlation have practical application in occu- 
pational and scholastic selection where it becomes necessary to combine 
a number of variables to provide the best possible estimate of a criterion 
measure. An appreciation of the relative contributions of the independent 
variables in predicting the criterion is not readily grasped by simple inspec- 
tion of the multiple regression coefficients. With two predictors the square 
of the multiple correlation coefficient may be shown equal to 


R? = B + Bs? + 2858373 


Thus the predicted variance is comprised of three additive parts. 6822 repre- 
sents a contribution by X», 85* a contribution by Ху, and the term 288372 
is a component which involves the correlation between X» and Xj. Thus 
the evaluation of the relative contributions of the different variables is not 
a simple matter of direct comparison of the relative magnitudes of the regres- 
sion coefficients but requires also a consideration of the correlation terms. 

Frequently, in practical work, the greater part of the prediction achieved 
can be attributed to a relatively small number of variables, perhaps four or 
five or six, and the inclusion of additional variables contributes only small 
and diminishing amounts to prediction. Tests of significance may be 
applied to decide whether or not the addition of one or more variables to a 
subset of variables will significantly improve prediction. 

Investigators concerned with problems of prediction frequently attempt to 
identify independent variables which show a high correlation with the 
criterion and a low correlation with each other. If two variables have a 
fairly high correlation with the criterion and a low correlation with each 
other, both measure different aspects of the criterion and both will con- 
tribute substantially to prediction. If two variables have a high correlation 
with each other, they are measures of much the same thing, and the inclusion 


of both, instead of either one or the other, will contribute little to the pre- 
diction achieved. 


EXERCISES 


1. Given the correlations ris = .70, ri; = -50, and rs = .60, compute risa. What per- 
centage of the association between variables 1 and 2 results because of the effect of 
variable 3? 

2. The mean and standard deviation of a criterion variable are Ж, = 24.56 and sı = 4.52. 
"The means and standard deviations for two predictor variables are f, = 3648, X, = 
16.95 and s; = 5.49, s, = 3.66. Thecorrelations are r,s = 10, riy = 65, and ry = .33. 
Compute (a) the correlation between standard scores on the criterion and the sum of 
standard scores on the two predictors, (b) the correlation between raw scores on the 
criterion and the sum of raw scores on the two predictors, (с) the multiple regression 


CHAP. 19] Partial and Mulliple Correlation 303 


equation in standard-score form, (d) the multiple regression equation in raw-score form, 
(е) the multiple correlation coefficient. 

3. The following are intercorrelations between first-year university averages and five 
university entrance examinations. Means and standard deviations are also given: 


X: X: X, X; Xs X; Si 


72.61 6.56 


62.50 
58.65 


65.80 
69.75 
71.80 


Compute (а) the multiple regression equation in standard-score form, (b) the multiple 
regression equation in raw-score form, (c) the multiple correlation coeficient, (d) the 
multiple correlation coefficients obtained by successively dropping variables 6, 5, and 4. 


d. a 
^ ^a a M. РОР 
aor 


APPENDIX 


TABLES 

A. Ordinates and Areas of the Normal Сигуе.....................++++ ete ee rns 306 
В. Critical Манев GEE а Ирана о mL 
C. Critical Values of Chi $дцаге........................++ ня 
D., Critical Уйга! OER 
E. Transformation of r to 2. .... E AON mh ennt 
Е. Critical Values of the Correlation Соейїсїепї.................... е 315 
G. Critical Values of p, the Spearman Rank Correlation Соейїсїелї................ 316 
H. Probabilities Associated with Values as Large as Observed Values of S in the 

Kendall Rank Correlation Соећсіепё.................... ++ 317 
I. Critical Values of Т in the Wilcoxon Matched-pairs Signed-ranks Test.......... 318 
J. Squares and Square Roots of Numbers from 1 to 1,000...... "SEO: 319 


305 


306 Appendix 


TABLE A 
ORDINATES AND AREAS OF THE NORMAL Curve* 
(In terms of e units) 


H Area Ordinate z Area Ordinate = Area Ordinate 
00 ‚0000 .3989 .50 1915 ‚3521 1.00 3413 ‚2420 
.01 .0040 .3989 .51 11950 .3503 1.01 .3438 .2396 
.02 .0080 .3989 .52 .1985 .3485 1.02 3461 2371 
.03 .0120 .3988 .53 .2019 .3467 1.03 .3485 .2347 
.04 .0160 .3986 .54 .2054 .3448 1.04 .3508 .2323 
.05 .0199 .3984 ES .2088 .3429 1.05 .3531 .2299 
, 06 .0239 .3982 :56 .2123 .3410 1.00 .3554 .2215 
.07 .0279 .3980 .57 .2157 .3391 1.07 .3577 .2251 
.08 .0319 .3977 .58 .2190 .3372 1.08 .3599 .2227 
.09 .0359 .3973 .59 .2224 .3352 1.09 .3621 .2203 
10 ‚0398 ‚3970 .60 .2257 .3332 1.10 .3643 .2179 
ET! .0438 .3965 .61 .2291 3312 1.11 :3065 .2155 
+12 0478 ‚3961 .62 .2324 .3292 1.12 .3686 2131 
13 ‚0517 3956 .63 .2357 .3271 1.13 .3708 ‚2107 
14 ‚0557 -3951 64 .2389 .3251 1.14 .3729 .2083 
AS .0596 .3945 65 ‚2422 .3230 1.15 .3749 .2059 
16 -0636 -3939 -66 -2454 ‚3209 1.16 ‚3170 .2036 
aT .0675 .3932 E ‚2486 ‚3187 1.17 3790 ‚2012 
.18 .0714 .3925 .68 2517 .3166 1.18 .3810 .1989 
19 ‚0753 «3918 -69 ‚2549 3144 1.19 ‚3830 ‚1965 
.20 -0793 .3910 ‚70 .2580 .3123 1.20 .3849 ‚1942 
E .0832 .3902 E .2611 -3101 1.21 ‚3869 1919 
.22 .0871 .3894 72 ‚2642 ‚3079 1.22 .3888 ‚1895 
.23 .0910 .3885 33 ‚2673 ‚3056 1.23 3907 ‚1872 
.24 ‚0948 -3876 4 -2703 ‚3034 1.24 ‚3925 1849 
‚25 ‚0987 ‚3867 .75 .2734 .3011 1.25 ‚3944 ‚1826 
.26 .1026 .3857 16 27164 .2989 1.26 ‚3962 ‚1804 
27 1064 .3847 17 2794 ‚2966 1.27 .3980 ‚1781 
‚28 ‚1103 3836 ‚78 .2823 ‚2943 1.28 ‚3997 -1758 
‚29 1141 .3825 .79 .2852 .2920 1.29 4015 ‚1736 
.30 ‚1179 .3814 ‚80 .2881 .2897 1.30 ‚4032 M 
E A217 .3802 .81 ‚2910 .2874 1.31 .4049 . 1691 
.32 .1255 ‚3790 .82 .2939 .2850 1.32 4066 ‚1669 
33 ‚1293 ‚3778 .83 ‚2967 .2827 1.33 ‚4082 ‚1647 
ET dan ‚3765 4 ‚2995 ‚2803 1.34 ‚4099 ‚1026 
.35 .1368 ‚3752 .85 ‚3023 .2780 1.35 4115 1604 
.36 ‚1400 3739 .86 .3081 ‚2756 1.36 ТЕ ‚1582 
.37 1443 3725 ‚3078 ‚2732 1.37 AMAT 156! 
.38 ‚1480 ‚3712 .88 -3106 ‚27109 1.38 4162 ‚1539 
39 1517 ‚3697 .89 ‚3133 ‚2685 1.39 4177 1518 
.40 ‚1554 -3683 .90 ‚3159 ‚2661 1.40 4192 1497 
Al ‚1501 .3668 91 ‚3186 ‚2637 1.41 .4207 .1476 
‚1628 ‚3653 92 ‚3212 2613 1.42 4222 1456 
E 1664 ‚3637 .93 ‚3238 .2580 1.43 4236 ‚1435 
ETI ‚1700 3621 94 ‚3264 ‚2565 1.44 .4251 1415 
EL -1736 ‚3605 .95 .3289 .2541 1.45 .4265 1394 
„46 1772 -3589 .96 3315 2516 1.46 4279 1314 
E .1808 .3572 97 -3340 2492 1.47 4292 ‚1354 
48 14 -3555 .98 ‚3365 ‚2468 1.48 .4306 .1334 
49 ‚1879 ‚3538 LI ETT] 244 1.49 Ano 1315 
50 1915 3521 1.00 Mis 2420 1.50 4332 1295 


* Reproduced from J. E. Wert, Educational statistics, by courtesy of McGraw-Hill Book Company, 
Inc., New York. 


Appendiz 307 
TABLE A (Continued) 

= Атеа Ordinate Ordinate Ordinate 
1.50 4332 ‚1295 2: .0175 
1.51 .4345 .1276 2. .0171 
1.52 .4357 .1257 2. .0167 
1.53 .4370 .1238 (d 0103 
1.54 ‚4382 ‚1219 2: .0158 
1.55 .4394 ‚1200 7. .0154 
1.56 .4406 .1182 2. .0151 
1.57 .4418 ‚1163 2. .0147 
1.58 ‚4429 ‚1145 2 .0143 
1.59 4441 1127 2. .0139 
1.60 .4452 .1109 rh i .0136 
1.61 .4463 ‚1092 2. 2.61 ‚0132 
1.62 ‚4474 ‚1074 2 2.62 ‚0129 
1.63 A484 .1057 2. 2.63 .0126 
1.64 ‚4495 ‚1040 2. 2.64 .0122 
1.65 ‚4505 ‚1023 2. 2.65 .0119 
1.66 ‚4515 ‚1006 2. 2.66 .0116 
1.67 .4525 .0989 x. 2.67 .0113 
1.68 .4535 .0973 2. 2.68 .0110 
1.69 .4545 .0957 2; 2.69 .0107 
1.70 4554 ‚0940 2j 2.70 .0104 
1.71 ‚4564 ‚0925 2. 2.71 .0101 
1.72 ‚4573 ‚0909 2: 2.72 .0099 
1.73 ‚4582 ‚0893 2. 2.73 .0096 
1.74 .4591 .0878 ac 2.74 ‚0093 
1.75 ‚4599 ‚0863 CA 2.75 .0091 
1.76 .4608 .0848 2 2.76 .0088 
1.77 .4616 .0833 2 2.71 ‚0086 
1.78 ‚4625 .0818 2 2.78 .0084 
1.79 .4633 .0804 2 2.79 ‚0081 
1.80 4641 ‚0790 2 2.80 .0079 
1.81 .4649 ‚0775 2 2.81 .0077 
1.82 .4656 .0761 2 2.82 .0075 
1.83 .4064 .0748 2 2.83 .0073 
1.84 A671 ‚0734 2 2.84 0071 
1.85 4678 20721 2 2.85 .0069 
1,86 .4686 .0707 2 2.86 .0067 
1.87 «4693 .0694 2 2.87 .0065 
1.88 ‚4699 ‚0681 2 2.88 .0063 
1,89 4706 .0669 2 2.89 .0061 
1.90 .4713 ‚0656 2 2.90 .0060 
1.91 ‚4719 0644 2 2.91 . 0058 
1.92 4726 ‚0632 2 2.92 ‚0056 
1.93 4732 .0620 2 2.93 .0055 
1.94 4738 .0608 2. 2.94 ‚0053 
1.95 A744 ‚0596 2 2.95 ‚0051 
1.96 ‚4150 .0584 2 2.96 .0050 
1.97 ‚4756 0573 2 2.97 ‚0048 
1.98 ,A761 .0562 2. 2.98 ‚0047 
1.99 .4767 .0551 2. 2.99 .0046 
2.00 A772 ‚0540 2. 3.00 .0044 


308 Appendix 


TABLE B 
CRITICAL VALUES OF /* 


Level of significance for one-tailed test 


.05 025 01 .005 .0005 

df 

Level of significance for two-tailed test 
20 .10 .05 .02 01 001 

1 3.078 6.314 12.706 31.821 63.657 636.619 
2 1.886 2.920 4.303 6.965 9.925 31.598 
3 1.638 2.353 3.182 4.541 5.841 12.941 
4 1.533 2.132 2.776 3.747 4.604 8.610 
5 1.476 2.015 2.571 3.365 4.032 6.859 
6 1.440 1.943 2.447 3.143 3.707 5.959 
7 1.415 1.895 2.365 2.998 3.499 5.405 
8 1.397 1.860 2.306 2.896 3.355 5.041 
9 1.383 1.833 2.262 2.821 3.250 4.781 
10 1.372 1.812 2.228 2.764 3.169 4.587 
11 1.363 1.796 2.201 2.718 3.106 4.437 
12 1.356 1.782 2.179 2.681 3.055 4.318 
13 1.350 1.771 2.160 2.650 3.012 4.221 
eM 1.345 1.761 2.145 2.624 2.977 4.140 
15 1.341 1.753 2.131 2.602 2.947 4.073 
16 1.337 1.746 2.120 2.583 2.921 4.015 
17 1.333 1.740 2.110 2.567 2.898 ‚965 
18 1.330 1.734 2.101 2.552 2.878 3.922 
19 1.328 1.729 2.093 2.539 2.861 3.883 
20 1.325 1.725 2.086 2.528 2.845 3.850 
21 1.323 1.721 2.080 2.518 2.831 3.819 
22 1.321 1.717 2.074 2.508 2.819 3.792 
23 1.319 1.714 2.069 2.500 2.807 3.767 
24 1.318 1.711 2.004 2.492 2.797 3.745 
25 1.316 1.708 2.060 2.485 2.787 3.725 
26 1.315 1.706 2.056 2.479 2.779 3.707 
27 1.314 1.703 2.052 2.473 2.771 3.690 
28 1.313 1.701 2.048 2.467 2.763 3.674 
29 1.311 1.699 2.045 2.462 2.756 3.659 
30 1.310 1.697 2.042 2.457 2.750 3.646 
40 1.303 1.684 2.021 2.423 2.704 3.551 
60 1.296 1.671 2.000 2.390 2.660 3.460 
120 1.289 1.658 1.980 2.358 2.617 3.373 
0 1.282 1.645 1.960 2.326 2.576 3.291 


| ; і | 
* Abridged from Table III of В. A. Fisher and F. Yates, Statístical taMes for biological, 


agricultural, and medical research, published by Oliver & Boyd, Ltd., Edinburgh, by per- 
mission of the authors and publishers. 


Appendix 309 


TaBLe С 
CRITICAL VALUES or CHI SQUARE* 


Probability under Hs that x? > chi square 


df 
.99 .98 .95 .90 .80 | .70 | .50 | .30 | .20 | .10 | .05 | .02 | .01 | .001 
1| .00016] .00063] .0039| .016| .064| .15| .46| 1.07| 1.64 2.71) 3.84) 5.41| 6.64/10.83 
2| .02 .04 .10 21 .45 .71| 1.39| 2.41| 3.22| 4.60| 5.99| 7.82| 9.2113.82 
3| .12 .18 .35 58 | 1.00 | 1.42 2.37| 3.66] 4.64) 6.25) 7.82 9.84 11.34 7 
4| .30 E x 1.06 | 1.65 | 2.20| 3.36| 4.88| 5.99) 7.78| 9.49/11.67/13.28 18.46 
5| .55 45 1.14 | 1.61 | 2.34 | 3.00) 4.35) 6.06| 7.29) 9.24 {та 15.09/20.52 
6| .87 1.13 1.64 | 2.20 | 3.07 | 3.83| 5.35| 7.23| 8.56/10.64/12.59/15.03/16.81/22.46 
7| 1.24 1.56 2.17 | 2.83 | 3.82 | 4.67 6.35| 8.38) 9.80/12.02/14.0716.62/18.4824.32 
8| 1.65 2.03 2.73 | 3.49 | 4.59 | 5.53| 7.34| 9.52/11.03/13.36/15.51/18.17/20.09/26.12 
9| 2.09 2.53 3.32 | 4.17 | 5.38 | 6.39] 8.34/10.66)12.24/14.68) 16.92) 19.68) 21.67/27 .88 
10| 2.56 3.06 3.94 | 4.86 | 6.18 | 7.27 9.34/11.78/13.44|15.9918.31/21.10/23.21/29.59 
11| 3.05 3.61 4.58 | 5.58 | 6.99 | 8.15/10.34/12.90/14.63/17.2819.08/22.62 24.72|31.26 
12| 3.57 4.18 5.23 | 6.30 | 7.81 | 9.03/11.34/14.01/15.81/18. 55 21.03/24.05/26.22/32.91 
13| 4.11 4.76 5.89 | 7.04 | 8.63 | 9.93/12.34/15.12/16.98/19.81 27.36/25 .47|27.89|34.53 
14| 4.66 5.37 6.57 | 7.79 | 9.47 |10.82)13.34)16.22) 18.15/21. 06/23 .68/26.87 29.14/36.12 
15| 5.23 5.98 7.26 | 8.55 |10.31 |11.72/14.34,17.32/19.31/22.31 25.00/28 .26|30. 58/37 .70 
16| 5.81 6.61 7.96 | 9.31 [11.15 |12.62/15.34/18.42/20. 46/23 .54 26.30/29.63|32.00/39.29 
17| 6.41 7.26 8.07 |10.08 |12.00 |13.53/16.34/19.51|21.62/ 24.77. 27.59/31.00/33.41|40.75 
18| 7.02 7.91 9.39 |10.86 |12.86 |14.44117.3420.60|22.76/25.99 28.87132.35)34.80|42.31 
19| 7.63 8.57 [10.12 |11.65 [13.72 |15.35/18.3421.69/23.90| 27.20/30.14/33.69/36. 1943 82 
20| 8.26 9.24 [10.85 [12.44 |14.58 |16.27|19.34122.78|25.04/28.41 31.41/35.02/37 5745.32 
21| 8.90 9.92 |11.59 [13.24 [15.44 |17.18|20.34/23.86 26.17] 29.02/32.67,36.34/38.93,46.80 
22| 9.54 10.60 12.34 14.04 |16.31 |18.10/21.24/24.94. 27 .80|30.81|33 , 92/37 .66/40.29/48.27 
2310.20 11.29 [13.09 [14.85 [17.19 |19.02,22.34/26.02. 28.43/32.01|35.17/38.97,41.64|49.73 
24110.86 11.99 13.85 [15.66 [18.06 |19.94/23.34|27 1029.55 33.20/36. 42/40.27/42.98)51.18 
25|11.52 12.70 |14.61 [16.47 |18.94 |20.87/24.34/28.17 30.68/34.38/37.05/41.57/44.31,52.62 
26112.20 13.41 15.38 |17.29 |19.82 |21.79|25.34/29.25|31.80/35. 56 38.88/42.8645.04/54.05 
27,112.88 14.12 |16.15 18.11 |20.70 |22.72 26.34/30.32/32.91/36.74/40.11/44.1446.90|55.48 
28) 13.56 14.85 [16.93 |18.94 |21.59 |23.05/27.34]31.39 34.03/37.92/41.34/45.42/48.28 56,89 
29/14.26 15.57 [17.71 |19.77 |22.48 |24.58/28.34/32.46 35.14/39.09/42.56,46.09/49.59/58.30 
30/14.95 16.31 18.49 20.60 29.3433.5336. 2540. 26 43.77 47.96 50.89 59.70 


* Abridged from Table IV of К. A. Fisher and F. Yates: Statistical tables for biological, 
agricultural, and medical research, published by Oliver & Boyd, Ltd., Edinburgh, by per- 


mission of the authors and publishers. 


TABLE D 
5 Per Cent (Roman TYPE) AND 1 PER CENT (Вогр-ғАСЕ TYPE) POINTS ғов THE DISTRIBUTION OF F* 


2j 22 23 33 38 :3 23 28 cu 
з 3 og os м +H мш бю aw ~< 
= is 28 23 39 53 $$ 55 :8 са 
БМ УУ ve emo A ie ie ehe 
2% а =a 7 d 
HE RIEIE E EIE EIN 
& 5 SS og ле Fe бш се na сч 
= “ т! E e خا‎ 
= | 2g 33 33 Sb $3 c8 RE FF ee 
- Ф cg ag “A те бш me iw aw 
= | Mj 9% 58 Ss 35 СЕ a2 ЕЕ cg 
= 3 og og меј те юм ма ма чч 
s | 8g 2% 98 58 33 £8 38 28 зс 
a 2g og ме ча me юю юю чч 
2 BEEF 58 84 
s |3 33 HE 
28 
"ETE 
a 8 5@ 
T F 2$ 
ge] “| ^$ ag 
Ы аа 
E $8 1% 
z = че C. 
g А ж, I. 
|| 33 
m D о o 
E zg 25 
э ea 33 
27 La o oe 
S + = 
| zg 59 
8 TSR 
5 E 
$ | - | 38 $s 
& se a an 
* Se 
SENE 
а og 
A EE 38 
o S na 
28 
= | 8g 53 
sg sg 
~ | BE 3 
? 28 
EET 
Sg 
«| E FR 33 FRE $8 58 
“5 SB: og Li Ө 46 =r mo mw 
E 22 BOR er Met 
+ 4g 98 SF RF 28 28 = jig 239 
is 1 asse | es мн ча коек 50 
E S6 se сфФ öp v be $i a 
¬ | Sj 38 83 $8 = 2E v3 ce 3$ 
T Ж, 23 og са мў ча Lr $e ne 
~ | 8B 88 58 38 t& =e ze 8 58 
N IEE 9 "a жа че 55 4 
MEEEEENE EE 
E SE 8$ "gd e$ eS «d ^d s 
Sey >» 
EEFE = ^" ” - v © ~ © ^ 
3 
ИТШ 
310 


9 29 д = 
25 39 $3 aS $$ 289 = 
iw ne N nie “я cd <a 
“uo -@ -9 u9 - 
я FS 58 ONS 3 23 © 
че чо че че -a <a <a 
со N an en = 
а ¥ ¥ Aa s 5% 3 
че чо dé чоў сж <a e 
сп vo wv © ce € = = 
jS ЧЁ 5% Sh SE $38 33 $2 55 
mw ane ч о чо me - = “oO LE on A 
жо KT co хо ze с € Б] = = = $ 
5 че MS ла z2 SE 58 5 28 59 23 & Li 
av че dé че че о че ei Jie ie e nw d 
хе о oe © e е © = Е 
чч о че dé че че че че e ei e ~e Р 
Se v aw te -a c тә 3 © pq z 
- 5 75 5% =a 5 Se 2$ $ © 5 4 
ч о че о че че че ча че ei e e ~e A 
Se =v сө жа vs -- c T a oe © т $ 
Ra Da Fe me = =a 5 $ SE 55 33 5 È 
nw no чо чо чо ae ae ч е сз ез oe - а 
жә = o с [3 v -@ x v © ө © Ё 
“8 а 2g 28 =8 $3 28 ЗР ЗР 58 |5 
aw че ce о че че ча че че 5 
I no o ч ay = о cw 
ES чя 2 = Sa 28 58 58 Sk $ 
БЕ че че о че ча че ча че чи |o 
Б] © = © 5 ZE S 
28 Se 88 я“ 28 28 54 28 51 |: 
* ee ma S © += 59 
БЕ 98 Sh Sa Aa 85 55 SR $ 
aw че ae as се че чи ча am [à 
bol 
=m we TE ve хо ом Va S5 e" 
ot mae == mne ae aw an N á 
aw че че че чо чо чо чә ne |a 
+e =a ny +o сө юж om 4 3 
58 55 55 25 28 Аа ля Ая A ti 
av че о че cim че че че чю чө | д 
- — © 
ГД = xe © aw S6 юн OF 
29 29 хз 59 SSB SH за ча ая |: 
БЕ че чо ue че че че чю ne 3 
ae co о ma Га 
Sa BS то то È 
4 ag de че * 
Ы 
5 wo cn *€ = 
58 2e OR + а 
oe че ne че 3 
T © 
ao >. м - 
ха co ne м d 
^. че de че 2 
ен 7 
88 $9 55 33 i 
БІЗ at at о че 5 
7 ma t Е 
33 = са FF $ 
9 ad aw aw © 
E 
ce S 
$8 SS Ач 5 55 28 58 È 
бб me юю бё чч ag чч 
= a ew © “a ow i 
F8 $8 fa zr name re Бе Я 
Чё mo юю № чю юю m oe Е 
© -] = vsu Wwe © 
= $ za se 99 ә « d 2$ & 
+S ы me ш ©Ф mo me =ош RE 
J E ы) = © оо 5 
$3 33 Ой 5 33 38 54 an Е 
tS ча ча ча 49 49 чю +a E 
РА ч i 
o> „л. M жет SECUS а БАЧ Н m 


Taste D (Contin 
Degrees of freedom for greater mean square 


zt $2 33 98 18 98 93 5z 2E G3 sg cf os 
ле ле Ши ود‎ да TN MR M Trim nu nu Б 
HS РЕ gd U8 ЧЕ $8 сй 33 $8 33 23 ce SP 
== Шш шиша иш NEM om v чч mu да ш 
TF 88 FF 82 $2 N сй as 5g ОШ ae Ой 
чапсан се чай и cam мы бы oS -A 44 
ЕЁ 28 iH SS сё 68 3i ЧЕ ЗЕ $3 Ga 38 
e Je ie ie ie ie -A -W4 -4 мя Z4 ہل‎ 
S8 28 ге ka га че 52 33 28 sh S3 Bs 
= Se че -H сә са = чи яе Уч мм m 
38 38 32 58 СЕ Ex ХЕ кї $8 58 $8 ТЕ 98 
-a eu N e ше e я че ie ie ie cd 
s9 Z3 28 38 28 28 ад SS FF 59 $8 $8 
-a HM -W e ie e KG меш Jie ie ie ie 
33 88 28 23 23 23 34 $8 28 Эй FQ FF NS 
AQ c cw зш ча =0 ли чш cW ли c c 58 
| $8 $8 95 55 89 55 Z9 38 88 эй £ ка ка 
Md -4 i e ме ш oa ча mA -« -« -W4 са 
Se 5$ 52 i8 25 328 ш $5 59 99 Ih 28 a8 
TUN SR на ли NA WI са 9 we «x ме nC uH 
ig SE 3f SE 88 3F 59 328 ai sd as £23 9 


ча CQ NA чи чи SA = - cH мч e м УЯ 


EIE БЕЙ EE! El Er 52 s$ ЕН ЕН $3 3 ЕЕ 


Cn NS) ай Ae ae wa NW мазча са мча cw S 


S$ 28 29 33 ЗЕ $3 58 Z8 35 5$ 8g РІ 50 
Me че че че че He cdd че че че че ie se 
88 TF SF 23 хр 33 98 S8 SE ge i8 СР = 
nw чо че че Ка че че че че че че че че 

EE. 38 ЕН ЕЛ i 28 3F Ер ЕЕ ss БЯ SE SF 


че SS ча ча dé че Wd Wd са че че ча GN 


mean 


88 55 83 33 88 a8 $8 Sk 23 52 Эй 58 5% 
мө ne чю Hé че че rie 59 че че ча че че 
SR 98 58 ЗЯ 38 55 AA AS лї 55 fe <F SF 
NR чи ч че d че NS d че чи о че че ne 
+% 89 58 33 28 28 53 Sa 88 33 99 33 Я 


"UM ^^"^ ча NR чо NG Md Ke hê na ч ч N^ 


39 58 38 :8 28 ЧБ ЧЧ ЯЕ ЗЕ 3F sa 53 52 


NO ме ми чи чу че a mé чи чи ne че чи 

55 52 i$ ye 58 =ë $4 vs 8 33 
БЕ nE uk Se SE а е $3 $3 $3 55 39 <3 
NN "^ че NA sd Mà Né Na чоч na no NO 


53 ал РЯ СЕ R3 $8 SE 28 28 ss sa 58 SE 
"* "9 N€ Nj Nd NÝ 44 чи hê no чы че се 
БЫ $3 $8 ТЕ 21 524 $$ 99 FF 23 iz 

aS aS RY (4 «d ad ad Hd 64 AS 
25 58 28 3¥ 29 38 33 НЕ FF їй A8 
ne ne AO "i iê ê +é 66 ^. ns ns 
SE RE 59 RS 5:9 SS VE 73 2F 58 Ez 


ch 9e 9» SE R Se Е Ke фә TM +r 


4. & Se ae 6.98 x. 3 £x * 


312 


en vo ¥ = е > РА me c м E 
Sa i НЫ Калынын ee ee a A 
no б GA 9 =a 92 3 ne ОФ ~ 
FE FE Fh Y 58 3 3 бб == с a5 ча Sa 
а led яя ин м йн ود‎ mM, eM ates. m -4 44 
= oe so Oa Б] 5 ГА a = 
FENET e $F 33 5 * $8 50 * 23 5S8 38 
Z4 dd мя dd éd яя яя e" en cn ad Z4 44 
4o © G] oo of ор v as 2 © THT 
Bo 6 n see и FF $$ 28 5 5з 5% ag 
edo udo mW e m. mu xm mi mM". "eu чч 44 
ro о noe N © Qe nF ш Б © [I ne c 
Re 58 ча no ok TE Yt te °з * 5 28 85 
Ado Wo mW. mui me. mri m mM TU m" "mu aw ae 
ao =o * ө ою F - =o x" v + = © 
за Sa $ та ne ^» м Se TB = + 29 55 
-A -A -A -- -A “aw -A mH -A -~A “wt -M 
ZE * ae o0 t © + = сю ne o a =a 
58 5 $8 Sa #9 vue ^ 5 2e FE SE 5 95 55 
че M) Ка RON ча очо ер еми чи наи انی یو‎ ям =ч 
za оя $9 t A a ra © + a €- us 
ЕП RA ©з 56 o 38 S 25 58 n 5 2e а ЗЕ 
че је AN HH NS AA ae =з м эй мн onem 4d as 
ГД Te © чю c © = м a8 щт сп e + - 
сө o noe on nO бю ne он 9 “ue ч сз © © 
$e б еа ка са СЯ гея RA $8 де 5 e SS $8 
~o o - = -0 ~o ~e LE а me o e mee -A aw “aw 
= e wv $9 = ae ne va сө -" 9 3 ae 
59 8S 5 9e © $8 £8 са са BM EA 58 5 сә 
се е асаана вие зина мантии 2E 
sg = $$ Z © ГД тө na оо сә o0 FE N Sg 
5 < © $ © 58 сб бя са са RA ~ & 
с ата йн ае ошен AOL <и еле NS 
сө её бо 79 С! SE 2*4 aT 409 ә © ew con" © 
aS 5 $96 со o SS 99 BY че va © $8 ка ка 
эе» misurisc WU c Mao M эң и ЖЫЛА, aa ON AN CE а 
© © © bo o + oct = 0 CS ve VS = oo 
$ 55 $9 58 5 & ә a EE ч BH sa & 2а 
се скае саси. AMO аи. ва с ш ON. саса =@ 
e ma по Г] oa t Г EF = 9 cw Oe + 
ZE SF SP $ $9 585 © © 22 БЕ $3 59 == БЕ 
SA еа NEG анна е deu VOR ee NE эсек. COLE SON. e 
ee © ке no те чо с 92 T v E = © 
£8 ЕЗ БЕ JE SH SE st 53 БЕ 38 28 58 53 $3 
ww Ne HH ни Ма QN NN EL жа a eT ey =н 
ча то = oo = ne e -9 е oo © v 
3a 28 28 28 58 88 SE GF 3S 55 25 58 58 58 
Me. «беле. ERN. NE ON анча чесе тади ae 
a = І 4 -- 8 © се oe v a 
a a & = SB 38 289 = = sg Se om о © 
AS NEE саске Wa. nd HE, OH Ae AG Kas Ne 
Г ГД - Га - a кә ca + a © 
Se Se $8 58 ИЯ ХЕ HE we 58 58 FF ЗЕ 23 28 
MW Wc dM йг. NON на, эйе MU ай сыт ON 
СЕ sa гоо оо 9 сю БҸ on" 9 a 
+ + 3 E] 5 28 oa "^ 4 ad aa ad ч - m 2 
че че о че че че чо че че че че чю ne 
Ж а - = хы V =a 9 © 
voe 55 5 й з a ro 99 + 59 $59 ne 
me ne че о че че че Ne cde че че че чө 
E — — “se © бсо ГА a - 
z4 $8 £8 FF 53 ЮЗ ee $3 zz ЧЕ 58 c8 
av ne aid Né dd AS че че чи че NH ne 
= z = Ee -1-] = a 
ga ss eg cg 4S 28 28 =8 $8 SE SR SE 53 89 
Sb н: Colne OS! 66) AE ow. OS) mw wem no 
na " E" a =a oS v 7 
Bq ge ge S4 ЕЕ ЕЗ g3 ЕЕ ТЕ SE 5ш SE SE BEE 
Sk qu WE WEIN EM. GH in WO MO nM, юч магае RS 
^ 
з so ig 4959 £X 3», 2.5 а 8-3 8 


314 Appendix 


TABLE E 
TRANSFORMATION OF 


ғ TO z,* 


r Zr r Zr r 3r r Sr r Zr 
——É ЕР ЫА = Eum Be gp Tos oo 43 
.000 .000 .200 .203 .400 .424 .600 .693 800 1.099 
005 .005 .205 .208 405 .430 605 701 .805 1.113 
010 .010 210.213 410 .436 610 .709 .810 1.127 
015 .015 215 .218 415 .442 615  .717 .815 1.142 
020  .020 220 .224 420 .448 620 .725 .820 1.157 
025 .025 .225 .229 425 .454 625 .733 .825 1.172 
030  .030 .230 .234 430 .460 630 .741 .830 1.188 
035 .035 .235 .239 435 .466 635 .750 .835 1.204 
040 .040 .240 .245 440 472 640 .758 .840 1.221 
045 .045 .245 .250 445 .478 645 767 .845 1.238 
050.050 .250 .255 450 .485 650  .775 .850 1.256 
055 055 .255 .261 455 .491 655 .784 .855 1.274 
060  .060 .260 .266 460 497 660 793 .860 1.293 
065 .065 .265 .271 465 .504 665 .802 .865 1.313 
070 .070 270 .277 470 .510 670 .811 .870 1.333 
075 .075 .275 .282 475 .517 675 .820 .875 1.354 
080 .080 .280 .288 480 .523 680 829 .880 1.376 
085 .085 .285 .293 485 .530 685 838 .885 1.398 
090  .090 .290 .299 490 .536 690 .848 .890 1.422 
095 .095 .295 .304 495 .543 695 ‚858 :895 1.447 
100 .100 .300 .310 500 .549 700 .867 .900 1.472 
105 .105 305.315 505 .556 705 .877 .905 1.499 
110 110 310 .321 510 .563 710 .887 .910 1.528 
15 116 35 .326 515 .570 715 .897 .915 1.557 
120 .121 320 .332 520 .576 720 .908 .920 1.589 
125 .126 335 9 337 525° -583 725 .918 .925 1.623 
130 .131 330 .343 530 .590 730 .929 .930 1.658 
135 .136 335 .348 535 .597 735 .940 .935 1.697 
140 141 340 .354 540 .604 740 .950 .940 1.738 
145 .146 345 .360 545 .611 745 .962 :945 1.783 
150 .151 350 .365 550 .618 750 .973 950 1.832 
155 .156 355 .371 555 .626 755 94 955 1.886 
160 ,161 360 .377 560 .633 .760 996 .960 1.946 
165 .167 365 .383 565 .640 .765 1.008 .965 2.014 
170 .172 370 .388 570 .648 .770 1.020 .970 2.092 
175° 17 375 .394 575 .655 „775 1.033 .975 2.185 
180 .182 380 .400 580 .662 .780 1.045 .980 2.298 
.485 .187 -385  .406 „585 .670 .785 1.058 .985 2.443 
.190  .192 .390 .412 .890 .678 ‚790 1.071 .990 2.647 
.195 .198 .395 .418 595 .685 .795 1.085 .995 2.994 


* Reprinted, by permission, from Allen L. Edwa 
sciences, Rinehart & Company, Inc., New York. 


rds, Statistical methods for the behavioral 


Appendiz 315 


TABLE Е 
CRITICAL VALUES OF THE CORRELATION COEFFICIENT 


Level of significance for one-tailed test 


.05 .025 .01 .005 
df 
Level of significance for two-tailed test 
.10 .02 01 
1 .988 .9995 .9999 
2 .900 .980 .990 
3 .805 .934 .959 
4 ‚129 ‚882 917 
5 ‚669 ‚833 874 
6 .622 789 .834 
7 .582 750 .798 
8 .549 716 ‚765 
9 ‚521 685 .135 
10 .497 658 708 
11 .476 634 ‚684 
12 ‚458 612 ‚661 
13 ‚441 592 ‚641 
14 .426 574 .623 
15 .412 558 .606 
16 .400 542 .590 
17 .389 528 .515 
18 .378 516 .561 
19 .369 503 .549 
20 .360 492 .537 
21 .352 482 ‚526 
22 .344 472 .515 
23 .337 462 .505 
24 .330 453 .496 
25 .323 445 .487 
26 ‚317 ‚437 ‚419 
27 .311 ‚430 4711 
28 .306 .423 .463 
29 .301 ‚416 .456 
30 .296 .409 .449 
35 ‚215 381 .418 
40 „251 358 ‚393 
45 .243 338 .372 
50 .231 322 .354 
60 ‚211 295 ‚325 
70 ‚195 274 .303 
80 ‚183 256 .283 
90 ‚173 242 .267 
100 ‚164 230 ‚254 


الا ا ا 

* Abridged from R. A. Fisher and F. Yates, Statistical tables for biological, agricultural 

and medical research, Oliver & Boyd, Ltd., Edinburgh, by permission of the authors an 
publishers. 


316 Appendix 


TABLE G 
CRITICAL VALUES OF p, THE SPEARMAN RANK CORRELATION COEFFICIENT* 


Significance level (one-tailed test) 


N 
05 01 

4 1.000 

5 .900 1.000 

6 .829 .943 

7 714 ‚893 

8 643 ‚833 

9 .600 783 
10 ‚564 746 
12 .506 712 
14 456 645 
16 425 601 
18 .399 564 
20 377 534 
22 359 508 
24 .343 485 
26 .329 465 
28 317 448 
30 .306 432 


ا ل a‏ لے 
Adapted from E. С. Olds, Distributions of sums of squares of rank differences for small‏ * 
numbers of individuals, Annals of Mathematical Statistics, 9, 133-148, 1938; The 5% signifi-‏ 
cance levels for sums of squares of rank differences and а correction, Annals of Mathematical‏ 
Statistics, 20, 117-118, 1949; with the kind permission of the author and the publisher.‏ 


Appendix 311 


Taste Н 
PROBABILITIES ASSOCIATED WITH VALUES AS LARGE AS OBSERVED VALUES OF S 
IN THE KENDALL RANK CORRELATION COEFFICIENT* 


Values of V Values of N 
E 8 
4 5 8 9 6 1 10 
o | .625 | 592 | .548 540 1-| .500 | .500 500 
2 | .375 | .408 | .452 460 3 | .360 | .386 431 
4 | .167 | .242 | .360 .381 5 озб 1281 364 
6 | .042 | 117 | .274 .306 7 | .136 | .191 300 
8 042 | .199 .238 9 | .068 | .119 242 
10 .0083 | .138 179 11 | .028 | .068 | .190 
12 .089 .130 13 | .0083 | .035 | .146 
14 .054 090 15 | .0014 | .015 | .108 
16 031 060 17 .0054 | .078 
18 ‚016 .038 | 19 .0014 | „054 
20 0071 | .022 21 00020 | .036 
22 0028 | .012 23 023 
24 00087 | .0063 25 014 
26 00019 | .0029 21 0083 
28 000025 | .0012 29 0046 
30 0003 | 31 | 0023 
32 00012 | 33 | 0011 
34 000005 | 35 00047 
36 0000028 | 37 .00018 
39 000058 
41 ‚000015 
43 0000028 
45 00000028 


س ا ا ا ا ا ا ا 
Adapted by permission from M. G. Kendall, Rank correlation methods, 2d ed., Charles‏ * 
Griffin & Co,, Ltd., London, 1955.‏ 


318 Appendiz 


TABLE I 
CRITICAL VALUES OF T IN THE WiLCOXON MATCHED-PAIRS SiGNED-RANKS TrsT* 
—————— — — 


Level of significance for one-tailed test 


025 | 01 


005 


Level of significance for two-tailed test 


05 .02 | 01 

6 0 Же = 
7 2 0 = 
8 4 2 0 
9 6 3 2 
10 8 5 3 
11 11 7 5 
12 14 10 7 
13 17 13 10 
14 21 16 13 
15 25 20 16 
16 30 24 20 
17 35 28 23 
18 40 33 28 
19 46 38 32 
20 52 43 38 
21 59 49 43 
22 66 56 49 
23 73 62 55 
24 81 69 61 
25 89 77 68 


* Adapted from Table I of F. Wilcoxon, Some rapid approximate statistical procedures, 


p. 13, American Cyanamid Company, New York, 1949, with the kind permission of the 
author, 


Appendiz 319 


TABLE J 
SQUARES AND SQUARE Roots or NUMBERS FROM | TO 1,000* 


Statistics for students of psychology and education, 


* By permission from H. Sorenson, 
copyright 1936, McGraw-Hill Book Company, Inc., New York. 


320 Appendix 
TABLE J (Continued) 


Number Square Square root | Number Square Square root 
81 65 61 9.0000 146 41 11.0000 
82 67 24 9.0554 1 48 84 11.0454 
83 68 89 9.1104 15129 11.0905 
84 70 56 9.1652 153 76 11.1355 
85 7225 9.2195 15625 11.1803 
86 73 96 9.2736 158 76 11.2250 
87 75 69 9.3274 16129 11.2694 
88 77 44 9.3808 16384 11.3137 
89 7921 9.4340 16641 11.3578 
90 8100 9.4868 169 00 11.4018 
91 8281 9.5394 17161 11.4455 
92 84 64 9.5917 174 24 11.4891 
93 86 49 9.6437 17689 11.5326 
94 88 36 9.6954 179 56 11.5758 
95 90 25 9.7468 18225 11.6190 
96 92 16 9.7980 18496 11.6619 
97 94 09 9.8489 187 69 11.7047 

| 98 96 04 9.8995 190 44 11.7473 
99 98 01 9.9499 19321 11.7898 
100 10000 10.0000 19600 11.8322 
101 10201 10.0499 19881 11.8743 
102 10404 10.0995 20164 11.9164 
103 10609 10.1489 20449 11.9583 
104 10816 10.1980 20736 12.0000 
10$ 1102$ 10.2470 21025 12.0416 
106 11236 10.2956 21316 12.0830 
107 11449 10.3441 21609 12.1244 
108 11664 10.3923 21904 12.1655 
109 11881 10.4403 22201 12.2066 
110 12100 10.4881 22500 12.2474 
11 12321 10.5357 22801 12.2882 
112 1 25 44 10.5830 23104 12.3288 
113 127 69 10.6301 23409 12.3693 
114 12996 10.6771 23716 12.4097 
115 13225 10.7238 24025 12.4499 
16. 134 56 10.7703 24336 12.4900 
117 1 36 89 10.8167 24649 12.5300 
118 13924 10.8628 249 64 12.5698 
119 14161 10.9087 25281 12.6095 
120 14400 10.9545 25600 12.6491 


Ы Ву permission from Н. Sorenson, Statistics for students of psychology and. education, 
copyright 1936, McGraw-Hill Book Company, Inc., New York. 


Appendiz 321 
TABLE J (Continued) 


Square root 
12.6886 1.174 
12.7279 14.2127 
12.7671 14.2478 
12.8062 14.2829 
12.8452 14.3178 
12.8841 14.3527 
12.9228 14.3875 
12.9615 14.4222 
13.0000 14.4568 
13.0384 14.4914 
13.0767 14.5258 
13.1149 14.5602 
13.1529 14.5945 
13.1909 14.6287 
13.2288 14.6629 
13.2665 14.6969 
13.3041 14.7309 
13.3417 14.7648 
13.3791 14.7986 
13.4164 14.8324 
13.4536 14.8661 
13.4907 14.8997 
13.5277 14.9332 
13.5647 14.9666 
13.6015 15.0000 
13.6382 15.0333 
13.6748 15.0665 
13.7113 15.0997 
13.7477 15.1327 
13.7840 15.1658 
13.8203 15.1987 
13.8564 15.2315 
13.8924 15.2643 
13.9284 15.2971 
13.9642 15.3297 
14.0000 15.3623 
14.0357 15.3948 
14.0712 15.4272 
14.1067 15.4596 
14.1421 15.4919 


* By permission from H. Sorenson, Statistics for students of psychology and education, 
copyright 1936, McGraw-Hill Book Company, Inc., New York. 


322 Appendix 
TABLE J (Continued) 


Square 


Square root 


241 5 80 81 15.5242 7 8961 16.7631 
242 58564 15.5563 79524 16.7929 
243 5 90 49 15.5885 8 00 89 16.8226 
244 5 95 36 15.6205 806 56 16.8523 
245 600 25 15.6525 8 12 25 16.8819 
246 605 16 15.6844 817 96 16.9115 
247 6 10 09 15.7162 8 23 69 16.9411 
248 615 04 15.7480 8 29 44 16.9706 
249 62001 15.7797 8 35 21 17.0000 
250 62500 15.8114 84100 17.0294 
251 63001 15.8430 84681 17.0587 
252 63504 15.8745 85264 17.0880 
253 6 40 09 15.9060 8 58 49 17.1172 
254 645 16 15.9374 86436 17.1464 
255 65025 15.9687 87025 17.1756 
256 6 55 36 16.0000 87616 17.2047 
257 6 60 49 16.0312 8 82 09 17.2337 
258 6 65 64 16.0624 88804 17.2627 
259 67081 16.0935 89401 17.2916 
260 67600 16.1245 9 00 00 17.3205 
261 68121 16.1555 90601 17.3494 
262 6 86 44 16.1864 91204 17.3781 
263 69169 16.2173 91809 17.4069 
264 696 96 16.2481 92416 17.4356 
265 70225 16.2788 93025 17.4642 
266 707 56 16.3095 9 36 36 17.4929 
267 71289 16.3401 9 42 49 17.5214 
268 71824 16.3707 94864 17.5499 
269 72361 16.4012 95481 17.5784 
270 72900 16.4317 96100 17.6068 
271 73441 16.4621 96721 17.6352 
272 73984 16.4924 973 44 17.6635 
273 74529 16.5227 97969 17.6918 
274 75076 16.5529 9 85 96 17.7200 
275 7 5625 16.5831 99225 17.7482 
276 7 6176 16.6132 9 98 56 17.7764 
277 76729 16.6433 10 04 89 17.8045 
278 77284 16.6733 101124 17.8326 
279 77841 16.7033 101761 17.8606 
280 78400 16.7332 102400 17.8885 


* By permission from Н. Sorenson, Statistics for students of psychology and education, 
copyright 1936, McGraw-Hill Book Company, Inc., New York. 


Appendix 323 
TABLE J (Continued) 


Number Square Square root Square Square root 


321 1030 41 


17.9165 130321 | 19.0000 
322 10 36 84 17.9444 131044 | 19.0263 
323 1043 29 17.9722 131769 | 19.0526 
324 1049 76 18.0000 132496 | 19.0788 
325 10 56 25 18.0278 133225 | 19.1050 
326 10 62 76 18.0555 133956 | 19.1311 
327 10 69 29 18.0831 134689 | 19.1572 
328 107584 | 18.1108 135424 | 19.1833 
329 1082 41 18.1384 136161 | 19.2094 
330 10 89 00 18.1659 136900 | 19.2354 
331 1095 61 18.1934 13 7641 19.2614 
332 110224 18.2209 138384 | 19.2873 
333 1108 89 18.2483 139129 | 19.3132 
334 1115 56 18.2757 139876 | 19.3391 
335 112225 18.3030 140625 | 19.3649 
336 1128 96 18.3305 141376 | 19.3907 
337 1135 69 18.3576 142129 | 19.4165 
338 1142 44 18.3848 142884 | 19.4422 
339 114921 18.4120 143641 | 19.4679 
340 11 56 00 18.4391 144400 | 19.4936 
341 116281 18.4662 14 5161 19.5192 
3542 1169 64 18.4932 145924 | 19.5448 
343 1176 49 18.5203 146689 | 19.5704 
344 11 83 36 18.5472 147456 | 19.5959 
345 119025 18.5742 148225 | 19.6214 
346 1197 16 18.6011 148996 | 19.6469 
347 12.04 09 18.6279 149769 | 19.6723 
348 121104 18.6548 150544 | 19.6977 
349 1218 01 18.6815 151321 19.7231 
350 1225 00 18.7083 152100 | 19.7484 
351 123201 18.7350 15 28 81 19.7737 
352 1239 04 18.7617 153664 | 19.7990 
353 12 46 09 18.7883 154449 | 19.8242 
354 1253 16 18.8149 155236 | 19.8494 
355 12 60 25 18.8414 156025 | 19.8746 
356 12 67 36 18.8680 156816 | 19.8997 
357 12 74 49 18.8944 157609 | 19.9249 
358 1281 64 18.9209 158404 | 19.9499 
359 12 88 81 18.9473 15 92 01 19.9750 
360 12 96 00 18.9737 160000 | 20.0000 


* By permission from H. Sorenson, Statistics for students of psychology and education, 
copyright 1936, McGraw-Hill Book Company, Inc., New York. 


324 Appendix 
TABLE J (Continued) 


Number Square Square Square root 


401 16 08 01 20.0250 441 19 44 81 21.0000 


402 16 16 04 20.0499 442 19 53 64 21.0238 
403 16 24 09 20.0749 443 19 62 49 21.0476 
404 16 32 16 20.0998 444 19 71 36 21.0713 
405 164025 20.1246 445 19 80 25 21.0950 
406 16 48 36 20.1494 446 19 89 16 21.1187 
407 16 56 49 20.1742 447 19 98 09 21.1424 
408 16 64 64 20.1990 448 ` 2007 04 21.1660 
409 16 72 81 20.2237 449 20 16 01 21.1896 
410 16 81 00 20.2485 450 20 25 00 21.2132 
411 16 89 21 20.2731 451 20 34 01 21.2368 
412 1697 44 20.2978 452 20 43 04 21.2603 
413 17 05 69 20.3224 453 20 52 09 21.2838 
414 17 13 96 20.3470 454 20 61 16 21.3073 
415 172225 20.3715 455 20 70 25 21.3307 
416 17 30 56 20.3961 456 20 79 36 21.3542 
417 17 38 89 20.4206 457 20 88 49 21.3776 
418 17 47 24 20.4450 458 20 97 64 21.4009 
419 17 55 61 20.4695 459 21 06 81 21.4243 
420 17 64 00 20.4939 460 21 16 00 21.4476 
421 177241 20.5183 461 212521 21.4709 
422 17 80 84 20.5426 462 213444 21.4942 
423 17 8929 20.5670 463 2143 69 21.5174 
424 17 97 76 20.5913 464 21 52 96 21.5407 
425 18 06 25 20.6155 465 2162 25 21.5639 
426 18 14 76 20.6398 466 2171 56 21.5870 
427 18 23 29 20.6640 467 21 80 89 21.6102 
428 18 31 84 20.6882 468 21 90 24 21.6333 
429 18 40 41 20.7123 469 2199 61 21.6564 
430 18 49 00 20.7364 470 22 09 00 21.6795 
431 18 57 61 20.7605 471 221841 21.7025 
432 18 66 24 20.7846 472 222784 21.7256 
433 18 74 89 20.8087 473 223729 21.7486 
44 18 83 56 20.8327 474 22 46 76 21.7715 
435 18 92 25 20.8567 475 225625 21.7945 
436 19 00 96 20.8806 476 22 65 76 21.8174 
437 19 09 69 20.9045 477 2275 29 21.8403 
438 19 18 44 20.9284 478 22 84 84 21.8632 
419 192721 20.9523 479 229441 21.8861 
440 19 3600 20.9762 480 2304 00 21.9089 


* By permission from H. Sorenson, Statistics for students of psychology and education, 
copyright 1936, McGraw-Hill Book Company, Inc., New York. 


Number Square 


482 23 23 24 
483 23 32 89 
484 23 42 56 
485 23 52 25 
486 23 61 96 
487 23 71 69 
488 23 81 44 
489 239121 
490 2401 00 
491 2410 81 
492 24 20 64 
493 24 30 49 
494 24 40 36 
495 24 50 25 
496 24 60 16 
497 24 70 09 
498 24 80 04 
499 24 90 01 
500 25 00 00 
501 25 10 01 
502 25 20 04 
503 25 30 09 
504 25 40 16 
505 25 50 25 
506 25 60 36 
507 25 70 49 
508 25 80 64 
509 25 90 81 
510 26 01 00 
511 261121 
512 262144 
513 26 31 69 
514 26 41 96 
515 26 52 25 
516 26 62 56 
517 26 72 89 
518 26 83 24 
519 26 93 61 
520 27 04 00 


* By permission from H. Sorenson, 


481 2313 61 


Appendiz 
TABLE J (Continued) 


21.9317 
21.9545 
21.9773 
22.0000 
22.0227 
22.0454 
22.0681 
22.0907 
22.1133 
22.1359 


22.1585 
22.1811 
22.2036 
22.2261 
22.2486 
22.2711 
22.2935 
22.3159 
22.3383 
22.3607 


22.3830 
22.4054 
22.4277 
22.4499 
22.4722 
22.4944 
22.5167 
22.5389 
22.5610 
22.5832 


22.6053 
22.6274 
22.6495 
22.6716 
22.6936 
22.7156 
22.7376 
22.7596 
22.7816 
22.8035 


325 


Square Square root 


27 14 41 
272484 
27 3529 
27 45 76 
27 5625 
27 66 76 
27 7729 
27 87 84 
27 98 41 
28 09 00 


28 19 61 
28 30 24 
28 40 89 
28 51 56 
28 62 25 
28 72 96 
28 83 69 
28 94 44 
29 05 21 
29 16 00 


29 26 81 
29 37 64 
29 48 49 
29 59 36 
29 70 25 
29 81 16 
29 92 09 
30 03 04 
30 14 01 
30 25 00 


30 36 01 
30 47 04 
30 58 09 
30 69 16 
30 80 25 
30 91 36 
31 02 49 
3113 64 
312481 
31 36 00 


copyright 1936, McGraw-Hill Book Company, Inc., New York. 


22. 
2. 
22. 
22. 
22. 
22. 
22. 
22. 
23; 
23. 


23. 
23. 
23. 
23, 
23. 
23. 
23. 
23; 
23. 
23. 


23. 
23. 
23: 
23. 
23. 
23. 
DX 
23. 
23. 
2 


23. 
23. 
23. 
23. 
23. 
23. 
23. 
23. 
‚6432 


23 


23. 


8254 
8473 
8692 
8910 
9129 
9347 
9565 
9783 
0000 
0217 


0434 
0651 
0868 
1084 
1301 
1517 
1733 
1948 
2164 
2379 


2594 
2809 
3024 
3238 
3452 
3666 
3880 
4094 
4307 
4521 


4734 
4947 
5160 
5372 
5584 
5797 
6008 
6220 


6643 


Statistics for students of psychology and education, 


326 
Number 
561 3147 21 
562 315844 
563 31 69 69 
564 31 80 96 
565 3192 25 
566 3203 56 
567 32 14 89 
568 32 26 24 
569 323761 
570 32 49 00 
571 32 60 41 
572 32 71 84 
573 32 83 29 
574 32 94 76 
575 33 06 25 
576 33 17 76 
577 33 29 29 
578 33 4084 
579 33 5241 
580 33 64 00 
581 33 75 61 
582 33 87 24 
583 33 98 89 
584 34 10 56 
585 34 2225 
586 34 33 96 
587 34 45 69 
588 34 57 44 
589 34 69 21 
590 34 81 00 
591 349281 
592 3504 64 
593 35 16 49 
594 35 28 36 
595 35 40 25 
596 35 52 16 
597 35 64 09 
598 35 7604 
599 35 88 01 
600 36 00 00 


* By permission from Н. Sorenson, Statistics for students of psychology and education, 


Appendix 
TABLE J (Continued) 


23.6854 
23.7065 
23.7276 
23.7487 
23.7697 
23.7908 
23.8118 
23.8328 
23.8537 
23.8747 


23.8956 
23.9165 
23.9374 
23.9583 
23.9792 
24.0000 
24.0208 
24.0416 
24.0624 
24.0832 


24.1039 
24.1247 
24.1454 
24.1661 
24.1868 
24.2074 
24.2281 
24.2487 
24.2693 
24.2899 


24.3105 
24.3311 
24.3516 
24.3721 
24.3926 
24.4131 
24.4336 
24.4540 
24.4745 
24.4949 


Square 


361201 
36 24 04 
36 36 09 
36 48 16 
36 60 25 
36 72 36 
36 84 49 
36 96 64 
37 08 81 
37 2100 


37 3321 
37 45 44 
37 57 69 
37 69 96 
37 82 25 
37 94 56 
38 06 89 
38 19 24 
38 31 61 
38 44 00 


38 5641 
38 68 84 
38 81 29 
38 93 76 
39 06 25 
39 18 76 
393129 
39 43 84 
39 56 41 
39 69 00 


39 81 61 
399424 
40 06 89 
40 19 56 
40 32 25 
40 44 96 
40 57 69 
40 7044 
40 83 21 
40 96 00 


copyright 1936, McGraw-Hill Book Company, Inc., New York. 


‚24. 
м. 
м. 
24. 
24. 
24. 
24. 
24. 
24. 
24. 


Square root 


5153 
5357 
5561 
5764 
5967 
6171 
6374 
6577 
6779 
6982 


Appendix 321 
TABLE J (Continued) 


Number Square Square root 


Square 


641 41 08 81 25.3180 46 37 61 26.0960 
642 41 21 64 25.3377 46 51 24 26.1151 
643 41 34 49 25.3574 46 64 89 26.1343 
644 41 47 36 25.3772 46 78 56 26.1534 
645 41 60 25 25.3969 46 92 25 26.1725 
646 4173 16 25.4165 47 05 96 26.1916 
647 41 86 09 25.4362 47 19 69 26.2107 
648 41 99 04 25.4558 47 33 44 26.2298 
649 421201 25.4755 47 47 21 26.2488 


650 42 25 00 


25.4951 


47 61 00 26.2679 


651 42 3801 25.5147 47 7481 26.2869 
652 42 5104 25.5343 47 88 64 26.3059 
653 42 64 09 25.5539 48 02 49 26.3249 
654 4277 16 25.5734 48 16 36 26.3439 
655 42 90 25 25.5930 48 30 25 26.3629 
656 43 03 36 25.6125 48 44 16 26.3818 
657 43 16 49 25.6320 48 58 09 26.4008 
658 43 29 64 25.6515 48 7204 26.4197 
659 43 42 81 25.6710 48 86 01 26.4386 
660 43 56 00 25.6905 49 00 00 26.4575 
661 43 69 21 25.7099 49 14 01 26.4764 
662 43 82 44 25.7294 49 28 04 26.4953 
663 43 95 69 25.7488 49 42 09 26.5141 
664 44 08 96 25.7682 49 56 16 26.5330 
665 44 22 25 25.7876 49 70 25 26.5518 
666 44 35 56 25.8070 49 84 36 26.5707 
667 44 48 89 25.8263 49 98 49 26.5895 
668 44 62 24 25.8457 50 12 64 26.6083 
669 44 75 61 25.8650 50 26 81 26.6271 
670 44 89 00 25.8844 50 41 00 26.6458 
671 45 02 41 25.9037 505521 26.6646 
672 451584 25.9230 50 69 44 26.6833 
673 45 29 29 25.9422 50 83 69 26.7021 
674 45 42 76 25.9615 50 97 96 26.7208 
675 45 56 25 25.9808 511225 26.7395 
676 45 69 76 26.0000 51 26 56 26.7582 
677 45 83 29 26.0192 51 40 89 26.7769 
678 45 96 84 26.0384 515524 26.7955 
679 46 10 41 26.0576 51 69 61 26.8142 
680 46 24 00 26.0768 51 8400 26.8328 


* By permission from H. Sorenson, 


Statistics for students of psychology and education, 


copyright 1936, McGraw-Hill Book Company, Inc., New York. 


328 Appendix 
TABLE J (Continued) 


Square Square 


Square root 


721 519841 


26.8514 579121 27.5862 
722 521284 26.8701 58 06 44 27.6043 
723 5227 29 26.8887 58 21 69 27.6225 
724 524176 26.9072 58 36 96 27.6405 
725 525625 26.9258 58 52 25 27.6586 
726 52 70 76 26.9444 58 67 56 27.6767 
727 52 85 29 26.9629 58 82 89 27.6948 
728 52 99 84 26.9815 58 98 24 27.7128 
729 53 14 41 27.0000 5913 61 27.7308 
730 53 29 00 27.0185 59 29 00 27.7489 
731 53 43 61 27.0370 59 44 41 27.7669 
732 53 58 24 27.0555 59 59 84 27.7849 
733 53 72 89 27.0740 59 75 29 27.8029 
734 53 87 56 27.0924 59 90 76 27.8209 
735 5402 25 27.1109 60 06 25 27.8388 
736 54 16 96 27.1293 60 21 76 27.8568 
737 543169 27.1477 60 37 29 27.8747 
738 54 46 44 27.1662 60 52 84 27.8927 
739 54 6127 27.1846 | 60 68 41 27.9106 
740 54 76 00 27.2029 60 84 00 27.9285 
741 54 90 81 27.2213 60 99 61 27.9464 
742 55 05 64 27.2397 61 15 24 27.9643 
743 55 20 49 27.2580 61 30 89 27.9821 
744 55 35 36 27.2764 61 46 56 28.0000 
745 55 5025 27.2947 616225 28.0179 
746 55 65 16 27.3130 61 77 96 28.0357 
747 55 8009 27.3313 61 93 69 28.0535 
748 559504 27.3496 62 09 44 28.0713 
749 56 1001 27.3679 622521 28.0891 
750 56 25 00 27.3861 62 41 00 28.1069 
751 56 40 01 27.4044 62 56 81 28.1247 
752 56 55 04 27.4226 62 72 64 28.1425 
753 56 70 09 27.4408 62 88 49 28.1603 
754 56 85 16 27.4591 63 04 36 28.1780 
755 57 00 25 27.4773 63 2025 28.1957 
756 57 15 36 27.4955 63 36 16 28.2135 
757 57 3049 27.5136 63 52 09 28.2312 
758 57 45 64 27.5318 63 68 04 28.2489 
759 57 6081 27.5500 63 84 01 28.2666 
760 577600 27.5681 64 00 00 28.2843 


* By permission from Н. Sorenson, Statistics for students of psychology апа education, 
copyright 1936, McGraw-Hill Book Company, Inc., New York. 


Appendix 329 
TABLE J (Continued) 


Number Square Square root | Number Square Square root 


801 641601 28.3019 | 707281 29.0000 
802 64 32 04 28.3196 | 70 89 64 29.0172 
803 64 48 09 28.3373 | 710649 29.0345 
804 64 64 16 28.3049 | 712336 29.0517 
805 64 80 25 28.3725 | 714025 29.0689 
806 64 96 36 28.3901 | 7157 16 29.0861 
807 65 12 49 28.4077 | 717409 29.1033 
808 65 28 64 28.4253 719104 29.1204 
809 65 44 81 28.4429 720801 29.1376 
810 65 61 00 28.4605 722500 29.1548 
811 657721 28.4781 724201 29.1719 
812 6593 44 28.4956 725904 29.1890 
813 66 09 69 28.5132 72 7609 29.2062 
814 6625 96 28.5307 7293 16 29.2233 
815 66 42 25 28.5482 73 1025 29.2404 
816 66 58 56 28.5657 73 27 36 29.2575 
817 66 74 89 28.5832 73 44 49 29.2746 
818 66 91 24 28.6007 73 61 64 29.2916 
819 67 07 61 28.6082 73 78 81 29.3087 
820 67 24 00 28.6356 73 96 00 29.3258 
821 67 4041 28.6531 741321 29.3428 
822 67 56 84 28.6705 74 30 44 29.3598 
823 67 7329 28.6880 74 47 69 29.3769 
824 67 89 76 28.7054 74 64 96 29.3939 
825 68 06 25 28.7228 74 8225 29.4109 
826 6822 76 28.7402 74 99 56 29.4279 
827 68 39 29 28.7576 751689 29.4449 
828 68 55 84 28.7750 753424 29.4618 
829 68 72 41 28.7924 75 5161 29.4788 
830 68 89 00 28.8097 75 6900 29.4958 
831 69 05 61 28.8271 75 86 41 29.5127 
832 69 22 24 28.8444 76 03 84 29.5296 
833 69 38 89 28.8617 76 21 29 29.5466 
834 69 55 56 28.8791 7638 76 29.5635 
835 69 72 25 28.8964 76 56 25 29.5804 
836 69 88 96 28.9137 76 73 76 29.5973 
837 70 05 69 28.9310 769129 29.6142 
838 70 22 44 28.9482 77 08 84 29.6311 
839 70 39 21 28.9655 772641 29.6479 
840 70 56.00 28.9828 77 4400 29.6648 


* By permission from Н. Sorenson, Statistics for students of psychology and education, 
copyright 1936, McGraw-Hill Book Company, Inc., New York. 


330 


Square Square root 


77 6161 29.6816 848241 30.3480 
77 7924 29.6985 85 00 84 30.3645 
77 96 89 29.7153 85 19 29 30.3809 
78 14 56 29.7321 85 37 76 30.3974 
78 32 25 29.7489 85 56 25 30.4138 
78 49 96 29.7658 85 74 76 30.4302 
78 67 69 29.7825 85 93 29 30.4467 
78 85 44 29.7993 86 11 84 30.4631 
79 03 21 29.8161 86 30 41 30.4795 
79 21 00 29.8329 86 49 00 30.4959 


79 38 81 29.8496 86 67 61 30.5123 
79 56 64 29.8664 86 86 24 30.5287 
79 74 49 29.8831 87 04 89 30.5450 
79 92 36 29.8998 87 23 56 30.5614 
80 10 25 29.9166 87 42 25 30.5778 
80 28 16 29.9333 87 60 96 30.5941 
80 46 09 29.9500 87 79 69 30.6105 
80 64 04 29.9666 87 98 44 30.6268 
80 82 01 29.9833 88 17 21 30.6431 
81 00 00 30.0000 88 36 00 30.6594 


81 1801 30.0167 88 54 81 30.6757 
813604 30.0333 88 73 64 30.6920 
81 54 09 30.0500 88 92 49 30.7083 
8172 16 30.0666 89 11 36 30.7246 
819025 30.0832 89 30 25 30.7409 
82 08 36 30.0998 89 49 16 30.7571 
82 26 49 30.1164 89 68 09 30.7734 
8244 64 30.1330 89 87 04 30.7896 
826281 30.1496 90 06 01 30.8058 
82 8100 30.1662 90 25 00 30.8221 


829921 30.1828 90 44 01 30.8383 
83 17 44 30.1993 90 63 04 30.8545 
83 35 69 30.2159 90 82 09 30.8707 
83 53 96 30.234 9101 16 30.8869 
83 7225 30.2490 912025 30.9031 
83 90 56 30.2655 9139 36 30.9192 
84 08 89 30.2820 91 58 49 30.9354 
84 27 24 30.2985 9177 64 30.9516 
84 45 61 30.3150 9196 81 30.9677 
84 64 00 30.3315 921600 30.9839 


* By permission from Н. Sorenson, Statistics for students of psychology and education, 
copyright 1936, McGraw-Hill Book Company, Inc., New York. 


Appendiz 331 
TaBLE J (Continued) 


923521 31.0000 96 23 61 31.3209 


962 92 54 44 31.0161 96 43 24 31.3369 
963 92 73 69 31.0322 96 62 89 31.3528 
964 92 92 96 31.0483 96 82 56 31.3688 
965 93 12 25 31.0644 97 0225 31.3847 
966 93 31 56 31.0805 97 21 96 31.4006 
967 93 50 89 31.0966 97 41 69 31.4166 
968 93 70 24 31.1127 97 6144 31.4325 
969 93 89 61 31.1288 97 8121 31.4484 
970 94 09 00 31.1448 98 01 00 31.4643 
971 94 28 41 31.1609 98 2081 31.4802 
972 9447 84 31.1769 98 40 64 31.4960 
973 9467 29 31.1929 98 60 49 31.5119 
974 94 86 76 31.2090 98 80 36 31.5278 
975 95 06 25 31.2250 99 00 25 31.5436 
976 95 25 76 31.2410 99 20 16 31.5595 
977 95 45 29 31.2570 99 40 09 31.5753 
978 95 6484 31.2730 99 6004 31.5911 
979 958441 31.2890 99 8001 . 31.6070 

96 04 00 31.3050 100 00 00 31.6228 


* By permission from Н. Sorenson, Statistics for students of psychology and education, 
copyright 1936, McGraw-Hill Book Company, Inc., New York. 


x kr = 


GLOSSARY OF SYMBOLS 


For most commonly used statistics, Roman letters denote sample values and Greek 
letters denote parameters. Exceptions to this are made either for convenience or in con- 
formity with common usage. For example, p denotes both the population value of the 
product-moment correlation coeflicient and the sample value of Spearman's rank-order 
correlation coefficient. з} denotes the sample value of the correlation ratio. T, not 7, 
denotes a true measurement. 

A bar above a symbol always indicates the arithmetic mean of a sample of observations. 
A few symbols are used with double or triple meanings. Homonyms are permissible in the 
language of mathematics as in any other. 

Some symbols with idiosyncratic use in a restricted context are not listed. 


Constant in a regression equation; used with subscripts as ay, and агу; first 
subscript denotes the predicted variable, second the observed variable. Geo- 
metrically, ay. and a, are distances from the origin where the regression lines 
intercept the У and X axes. 
Regression weight applied to an independent variable, or predictor, in original 
units; used with subscripts as byz and bzy to distinguish predicted from observed 
variable. Geometrically, byz and bz, are the slopes of regression lines. 
(1) A constant. 
(2) Denotes cth column in a set of C columns. 
(1) Number of columns. 
(2) Contingency coefficient, measure of association between nominal variables. 
Number of combinations of N things taken r at a time. 
(1) Difference between paired ranks. 
(2) Difference between the mean of a subgroup and the mean of combined 
groups, X; — X = di. 
Difference between paired measurements. 
Degrees of freedom. 
Base of Napierian logarithms, 2.7183. 
Sampling or measurement error associated with the ith value. 
(1) Expected frequency in the calculation of x*. 
(2) The expectation of or expected value, as E(X) or E(X — uy. 
Frequency in a distribution; used with subscript to denote interval or subclass, 
fi. 
Marginal frequencies in bivariate distribution. 
Cell frequency in bivariate distribution. 
Ratio of two sample variances. 
Measure of skewness. 
Measure of kurtosis. 
Size of a class interval. 
Null hypothesis, as in Ho: pı — p = 0. 
Subscripts used to identify particular observations in a group. 
(1) Number of subclasses. 
(2) Number of times a test is lengthened. 
333 


Glossary of Symbols 


Kendall's coefficient of consistence. 

The rth moment about the arithmetic mean. 

(1) Number of observations in а subclass; used with subscript to indicate sub- 
class, эм, пу, nij, etc. Always used where the number of subclasses is 
greater than 2. 

(2) Number of test items. 

Number of observations in a sample. 

Number of members in a population. 

Observed frequency in the calculation of x?. 

(1) Sample proportion in the ith class; estimate of the probability of the occur- 
rence of the ith event. 

(2) In relation to psychological tests denotes the proportion of individuals 
passing item 7. 

Percentile point; subscript denotes particular percentile point, as P»o, Рьо, etc. 

Number of permutations of N things taken r at a time. 

Sample proportion or probability estimate in one of two mutually exclusive 

classes. 

1— №. 

(1) Sample value of the correlation coefficient; used with subscripts to denote 
variables correlated, rsy, riz, etc. 

(2) Denotes rth row in a set of R rows. 

Biserial correlation coefficient. 

Point biserial correlation coefficient, 

Tetrachoric correlation coeflicient. 

Reliability coefficient. 

Reliability coefficient for a half test. 

Reliability coefficient for a test lengthened # times. 

Partial correlation coefficient, 

(1) Number of rows, 

(2) Sum of ranks; used with subscript to denote sum of ranks for the jth group, 
Rj. 

(3) Multiple correlation coeficient. 

(1) Standard deviation of a sample; used with subscript to denote variable, 
Sey Sy, etc. 

(2) Estimate of the standard error of a statistic; subscript denotes statistic, 
Sf, Sp, Ss, etc, 

Variance estimate, the square of any standard deviation; used with subscripts 

as indicated under s, 

Sample standard deviation corrected for grouping error, 

Standard error of estimate. 

Standard deviation of predicted values of Y and Х. 

Standard error associated with individual measurement, Xy. 

Difference between nümber of agreements and disagreements in the calculation 

of Kendall's tau. 

(1) Ratio of normally distributed variable to an estimate of the standard error 
of that variable. Deviation from the origin along the base line of distribu- 
tion of t, 

(2) Number of values tied at a particular rank in a set of ranks. 

(1) True value of an observation or measurement; used with subscript, Т. 

(2) In the analysis of variance denotes the sum of observations; T is the sum of 
observations in the jth group. 


L——— C —————. --- YT i». 


Glossary of Symbols 335 


(3) Correction factor for ties in the calculation of Kendall's tau and соећсіепі of 
concordance; used with subscript to denote variable, Tz, Ty. 

Kendall's coefficient of concordance. 

Variable expressed as deviation from the arithmetic mean; subscripts denote 

particular values of the variable, ху, Yi, etc. 

Variable expressed as deviation from arbitrary origin, sometimes involving a 

change in unit. Computation variable. 

Variable in original units; subscripts used to denote particular values of the 

variable, Xi, У}, etc. 

Arithmetic mean of a sample. A bar above a symbol always denotes a sample 

mean. 

Arbitrary origin, 

Ordinate of unit normal curve. 

(1) Variable expressed in standard-score form, z = (X — X)/s,. Subscript 
used to denote variable, Zz, Zy- 

(2) Deviation from origin along base line of normal curve of unit area and unit 
standard deviation. 

Transformation of the correlation coeflücient to approximate normal form; 

used in tests of significance on r. 

Regression weight in а multiple regression equation applied to an independent 

variable or predictor in standard-score form. 

Correlation ratio. 

Population value of a proportion. 

Population mean; used with subscript to indicate variable, uz. 

Ratio of the circumference of a circle to the diameter, 3.1416. 

(1) Correlation coefficient in a population. 

(2) Sample value of the rank-order correlation coefficient. 

Maximum likelihood estimate of p. 

(1) Standard deviation of a population; used with subscript to denote variable, 
95, Fy, CC. 

(2) Standard error, standard deviation of sampling distribution; subscript 
denotes statistics, oz, v», etc. 

Maximum likelihood estimate of e. 


The sum of; the operation of adding a set of variate values. Symbols above 
and below define limits of the summation. 


Kendall’s coefficient of rank correlation, tau. 
Phi coefficient, measure of fourfold point correlation. 
Chi square. 

a is greater than b. 

а is less than b. 

a is greater than or equal to 5. 

a is less than or equal to Û. 

a is very much greater than b. 

a is very much less than b, 

Is equal to. 

Is not equal to. 

Absolute value of a. 

The square root of a. 

Infinity. 


PA — T 
ы 


ли 
04 i 
T Mao ^ * 
n eed Я 
ЭРЕЗЕ 
ма уув: ТР: 


met mony 


(ory 


EL 
карас 
зә ЭАе 


REFERENCES 


Aitken, A. C. 1937. The evaluation of a certain triple-product matrix. Proceedings of 
the Royal Society of Edinburgh, 57, 172-181. 

Aspen, Alice A. 1949. Tables for use in comparisons whose accuracy involves two 
variances, separately estimated. Biometrika, 36, 290-291. 

Auble, D. 1953. Extended tables for the Mann-Whitney statistic. Bulletin of the 
Institute for Educational Research, Indiana University, vol. 1, no. 2. 

Binder, A. 1955. The choice of an error term in analysis of variance designs. Psy- 
chometrika, 20, 29-50. 

Cochran, W. G., and G. M. Cox. 1950. Experimental designs. New York: John Wiley & 
Sons, Inc. 

Comrie, L. J. (ed.). 1947. Barlow's tables of squares, cubes, square roots, cube roots and 
reciprocals, 4th ed. London: E. and Е. N. Spon Ltd. 

Cornell, Francis G. 1956. The essentials of educational statistics. New York: John 
Wiley & Sons, Inc. 

Cronbach, L. J. 1947. Test reliability: its meaning and determination. Psychometrika, 
12, 1-16. 

1951. Coefficient alpha and the internal structure of tests. Psychometrika, 
16, 297-334. 

Edwards, Allen L. 1954. Statistical methods for the behavioral sciences. New York: 
Rinehart & Company, Inc. 

Ferguson, George A. 1941. The reliability of mental tests. London: University of London 
Press, Ltd. 

1951. A note on the Kuder-Richardson formula. Educational and Psychological 
Measurement, 11, 612-615. 

Finney, D. J. 1944. The application of probit analysis to the results of mental tests. 
Psychometrika, 9, 31-39. 

1947. Probit analysis. New York: Cambridge University Press. 

1948. The Fisher-Yates test of significance in 2 X 2 contingency tables, Bio- 
metrika, 35, 145-156. 

Fisher, R. А. 1948. Statistical methods for research workers, 10th ed. Edinburgh: Oliver & 
Boyd, Ltd. 

and Е. Yates. 1953. Statistical tables for biological, agricultural, and medical 
research, 4th ed, Edinburgh: Oliver & Boyd, Ltd. 

Fishman, Joshua A. 1956. A note on Jenkins’ improved method for tetrachoric r. 
Psychometrika, 21, 305. 

Freund, John E. 1952. Modern elementary statistics. Englewood Cliffs, N.J.: Prentice- 


Hall, Inc. 
Friedman, M. 1937. ‘The use of ranks to avoid the assumption of normality implicit in 
the analysis of variance. J ournal of the American Statistical Association, 32, 675-701. 
. 1940. A comparison of alternative tests of significance for the problem of m 
rankings. Annals of Mathematical Statistics, 11, 86-92. 
Fryer, Н. С. 1954. Elements of statistics. New York: John Wiley & Sons, Inc. 
337 


338 References 


Garrett, Henry E. 1953. Statistics in psychology and education, 4th ed. New York: 
Longmans, Green & Co., Inc. 

Gourlay, Neil. 1955. F-test bias for experimental designs in educational research. Psy- 
chometrika, 20, 227—248. 

Gronow, D. С. С. 1951. Test for the significance of differences between means in two 
normal populations having unequal variances. Biometrika, 38, 252-256. 

Guilford, J. P. 1954. Psychometric methods, 2d ed. New York: McGraw-Hill Book 
Company, Inc. 

1956. Fundamental statistics in psychology and education, 3d ed. New York: 
McGraw-Hill Book Company, Inc. 

Gulliksen, H. 1950. Theory of mental tests. New York: John Wiley & Sons, Inc. 

Jackson, R. W. B., and George A. Ferguson. 1941. Studies on the reliability of tests. 
Bulletin 12, University of Toronto, Department of Educational Research, Toronto. 

and 1942. Manual of educational statistics. University of Toronto, 
Department of Educational Research, Toronto. 

Jarrett, К. Е. 1945. On the permissible coarseness of grouping. The Journal of Educa- 
tional Psychology, 36, 385-395. 

Jenkins, W. L. 1955. An improved method for tetrachoric r. Psychometrika, 20, 
253-258. 

Johnson, Palmer О. 1949. Statistical methods in research. Englewood Ciitis, N.J.: 
Prentice-Hall, Inc. 

and Robert W. B. Jackson. 1953. Introduction to statistical methods. Englewood 
Cliffs, N.J.: Prentice-Hall, Inc. 

Kendall, М. С. 1943. The advanced theory of statistics, vol. I. London: Charles Griffin & 
Co., Ltd. 

——. 1946. The advanced theory of statistics, vol. II. London: Charles Griffin & Co., 
Ltd. 


1955. Rank correlation methods, 2d ed. London: Charles Griffin & Co., Ltd. 

Kenney, John Е, 1947. Mathematics of statistics, part 1, 2d ed. Princeton, N.J.: D. 
Van Nostrand Company, Inc. 

and E. S. Keeping. 1951. Mathematics of statistics, part 2, 2d ed. Princeton, 
N.J.: D. Van Nostrand Company, Inc. 

— —— and 1954, Mathematics of statistics, part 1, 3d ed. Princeton, N.J.: 
D. Van Nostrand Company, Inc. 

Kruskal, W. H., and W. A. Wallis. 1952. Use of ranks in one-criterion variance analysis. 
Journal of the American Statistical Association, 47, 583-621. 

Kuder, С. F., and M. W. Richardson. 1937. The theory and estimation of test relia- 
bility. Psychometrika, 2, 151-160. 

Lacey, John I. 1956. The evaluation of autonomic responses: towards a general solution. 
Annals of the New York Academy of Sciences, 67, 123-164. 

Lindquist, E. F. 1953. Design and analysis of experiments in psychology and. education. 
Boston: Houghton Mifflin Company. 

Lindzey, Gardner. 1954. Handbook of social psychology, vol. I. Reading, Mass.: Addi- 
son-Wesley Publishing Company. 

Lord, Frederic M. 1955a. Estimating test reliability. Educational and Psychological 
Measurement, 15, 325-336. 

————. 1955b. Sampling fluctuations resulting from the sampling of test items. Psy- 
chometrika, 20, 1-22. 

1957. Do tests of the same length have the same standard errors of measure- 
ment? Educational and Psychological M. easurement, 17, 510-521. 

Macmeeken, A. М. 1940. The intelligence of a representative group of Scottish children. 
London: University of London Press, Ltd. 


References 339 


Mann, H. B., and D. R. Whitney. 1947. Ona test of whether one of two random vari- 
mrs stochastically larger than the other. Annals of Mathematical Statistics, 
18, k 

McNemar, Quinn. 1947. Note on the sampling error of the differences between correlated 
proportions or percentages. Psychometrika, 12, 153-157. 

1955. Psychological statistics. New York: John Wiley & Sons, Inc. 

Moses, L. Е. 1952. Nonparametric statistics for psychological research. Psychological 
Bulletin, 49, 122-143. 

Mosteller, Frederick, and Robert R. Bush. 1954. Selected quantitative techniques. 
Handbook of social psychology, Gardner Lindzey (ed.), vol. 1, pp. 289-334, Reading, 
Mass.: Addison-Wesley Publishing Company. 

Nair, K. R. 1940. Tables of confidence intervals for the median in samples from any 
continuous population. Sankhya, 4, 551-558. (Not seen.) 

Peters, Charles C., and Walter R. Van Voorhis. 1940. Statistical procedures and theig 
mathematical bases. New York: McGraw-Hill Book Company, Inc. 

Siegel, Sidney. 1956. Nonparametric statistics. New York: McGraw-Hill Book Com- 
pany, Inc. 

Snedecor, G. W. 1956. Statistical methods, 5th ed. Ames, Iowa: Iowa State College 
Press. 

Stevens, S. S. (ed.). 1951. Handbook of experimental psychology. New York: John 
Wiley & Sons, Inc. 

Stevens, $. $. 1957. On the psychophysical law. Psychological Review, 64, 153-181. 

Tate, Merle W., and Richard C. Clelland. 1957. Nonparametric and shortcut statistics. 
Danville, Ш.: The Interstate Printers and Publishers. 

Thomson, Godfrey H. 1951. The factorial analysis of human ability, 5th ed. London: 
University of London Press, Ltd. 

Thurstone, L. L. 1944. А factorial study of perception. Chicago: University of Chicago 
Press. 

Torgerson, Warren S. 1958. Theory and methods of scaling. New York: John Wiley & 
Sons, Inc. 

Tsao, Fei. 1946. General solution of the analysis of variance and covariance in the case of 
unequal or disproportionate numbers of observations in the subclasses. Psychometrika, 
11, 107-128. 

Tukey, J. W. 1949. Comparing individual means in the analysis of variance. Bio- 
metrics, 5, 99-114. 

Walker, Helen M. 1943. Elementary statistical methods. New York: Henry Holt and 
Company, Inc. 

and Joseph Lev. 1953. Statistical inference. New York: Henry Holt and 
Company, Inc. 

Welch, B. L. 1938. The significance of the differences between two means when the 
population variances are unequal. Biometrika, 29, 350—362. 

1947. The generalization of student's problem when several different population 
variances are involved. Biometrika, 34, 28-35. 

Wilk, M. B., and O. Kempthorne. 1955. Fixed, mixed, and random models. Journal of 
the American Statistical Association, 50, 1144-1167. 

Wilks, Samuel S. 1949. statistical analysis. Princeton, N.J.: Princeton 
University Press. 

Woo, T.L. 1928. Dextrality ". md of hand and eye, 2d memoir. Biometrika, 


20А, 79-148. “= 
"тә 
є... 


INDEX 


Absolute zero, 11 

Age allowances, 225-226 

Aitken, A. C., 298 

Aitken's numerical solution, 298-301 

Alternative hypothesis, 135-136 

Analysis of variance, 227-263 
sampin underlying, 212, 239-240, 

249, 264 


choice of error term, 250-252 
classification, higher, 262 
one-way, 227-241 
two-way, 242-263 
computation, one-way classification, 
234-238 
two-way classification, 253-259 
unequal numbers in subclasses, 259-261 
degrees of freedom, 232, 234, 236-239, 
245-246, 256-259 
F ratio in, 234, 238, 246, 251-253 
interaction, nature of, 247-248 
mean square (see variance estimate, 
below) 
models, finite, random, fixed, and mixed, 
248-252 
notation, 229-230, 243-244 
null hypothesis in, 234, 249 
by ranks, correlated samples, 272-274 
independent samples, 270-272 
sum of squares, between groups, 232 
within groups, 230-231 
for interaction, 244—248 
partitioning, 230-231, 244-245 
pooling, 252-253 
for two groups, 239 
with е numbers in subclasses, 259- 
6. 
varie estimate, expectation, 233-234, 


9 
meaning, 232-234, 248-251 
one-way classification, 231-233 
two-way classification, 245-247 
Arbitrary origin, 39-42 . 
Arithmetic mean (see Mean) 
Aspen, Alice A., 145 
Assumed mean, 40 
Attenuation, 284-285 
Auble, D., 269 Ma 
Average, 37-51 E 4 
(See also Mean; Median) — 
Average (mean) deviation, 53-54 


341 


Barlow's tables, 13 
Beta coefficient, 294-296 
Biased estimate, 122-123 
Bibliography, 337-339 
Bimodal distribution, 32, 33, 48 
Binder, A., 253 
Binet, Alfred, 3 
Binomial distribution, 71-76 
goodness of fit, 161-162 
and hypothesis testing, 75-76, 266 
kurtosis of, 74 
limiting form, 78 
mean of, 74 
related to normal curve, 83-84 
skewness, 74 
variance, 74 
Biserial correlation, 203-204 
Bivariate distribution, 93-94 


Chi square (x?), 157-177 
applied, in analysis of variance by ranks, 
271-274 
in contingency tables, 165-169 
in rank test, for Ё correlated samples, 
273-274 
for k independent samples, 270-272 
in р pets for k independent samples, 
6 
for two correlated samples, 266-267 
for two independent samples, 
265-266 
computation, combining frequencies in, 
162-163, 177 
correction for continuity, 171-172 
critical values, table, 309 
defined, 158 
degrees of freedom, 159-161, 164-165, 
168, 171, 172, 176, 189, 192, 267, 271, 
273-274 
distribution, 158-161 
formulas for, 158, 169, 171, 172, 176 
for fourfold table, 168-169, 176 
one- and two-tailed tests, 175 
related, to contingency coefficient, 195 
to normal deviate, 170, 175 
to phi coefficient, 197 
to sample size, 175-176 
sampling distribution, 158-161 
small expected frequencies, 171-173 


342 Slalislical Analysis in Psychology and Education 


Chi square (x?), in test, of coefficient of 
consistence, 192 
of difference between proportions, 169- 
171 


of goodness of fit, 161-165 
of independence, 165-169 
of unequal and disproportionate fre- 
quencies, 259-260 
Class boundaries, 23 
Class interval, 22-25 
conventions regarding, 22 
defined, 22 
distribution of observations within, 24-25 
exact limits, 22-24 
mid-point, 24 
Cochran, W. G., 143, 145 
Coefficient, of concordance, 186-188 
formula for, 18 
related to rho, 188 
significance, 188-189 
with tied ranks, 188 
of consistence, 180-193 
formula for, 191 
significance, 192-193 
(See also Contingency coefficient ; Corre- 
lation coefficient; Phi coefficient; 
Reliability coefficient) 
Combinations, 71 
Comrie, L. J., 13 
Concordance (see Coefficient) 
Confidence interval, 121 
for correlation coefficient, 152 
for means, of ae тае 120-122 
of small sam es, 12 
for median, 12! 
for proportion, 128 
for score or measurement, 286-287 
for standard deviation, 130 
Consistence (see Coefficient) 
Constant defined, 9 
Constant process, 3 
Contingency coefficient, 87, 194-196 
and chi square, 195 
maximum value, 195-196 
significance, 196 
Contingency table, 166 
Cornell, Francis G., 160%. 
Correction, for attenuation, 284-285 
for continuity, 171-172, 266 


for grouping, 58-59, 155 
Correlation, 86-111, 179-209, 290-303 
measures of, 1, 203-204 


concordance, 186-188 

contingency coefficient, 194-196 

correlation ratios, 206-209 

Kendall's tau, 183-185 

multiple (see Multiple correlation) 

partial, 290-291 

phi coefficient, 196-199 

point biserial, 194, 199-202 

product-moment (see Product-moment 
correlation) 

rank (see Rank correlation) 

Spearman's rho, 179-181 


Correlation, measures of, tetrachoric, 194, 
206 


and prediction, 99-111 
and regression, 105-106 
of sums, 292-293 
t ratio for, 152-153, 155 
between true scores, 284-285 
variance interpretation, 107-109 
Зета coefficient, confidence interval 
or, 15 
effect of measurement error on, 284-285 
for multiple correlation, 295, 297, 301 
sampling distribution (see Sampling dis- 
tribution) 
standard error, 151 
tetrachloric, significance test, 206 
Correlation ratios, eta (n), 206-209 
related to r, 208 
significance, 209 
Cosine-pi coefficient, 205-206 
Covariance, 97 
Cox, G. M., 143, 145 
Cronbach, L. J., 289 
Cumulative distribution, 25 


Decile point, 215 
Degrees of freedom, in analysis of variance 
(see Analysis of variance) 
for chi square, 159-161, 164-165, 168, 
171, 172, 176, 189, 192, 267, 271, 
273-274 
for contingency tables, 168 
for F, 141-142, 208-209, 301 
geometric interpretation, 124 
meaning, 123-125 
for t, 126-127, 137, 139-140, 143-145, 
152-155, 183, 202, 291 
Delta scores, 224 
Descriptive statistics, 7 
Deviation, mean, 53-54 
standard (see Standard deviation) 
Deviation score, 40 
Difference (see Significance test; Standard 
error; ¢ ratio 
Distribution, bimodal, 32, 33, 48 
binomial (see Binomial distribution) 
bivariate, 93-94 
chi-square, 158-161 
cumulative, 25 
F, 140-142 
frequency, 19-86 


hic esentation, 26-37 
| М, 33, 59 


normal, Jee 

properties, 31-36 

rectangular, 32, 33, 59 

sampling (see Sampling distribution) 
skewed, 31-35 


Index 


Distribution-free tests (see Nonparametric 


tests) 
Doolittle method, 298 


Edwards, Allen L., 188, 189, 205, 238, 314m. 
End values, 23 
Error, of estimate, 106-107 
grouping, 58-59, 155 
of measurement, 275-289 
etta e correlation coefficient, 284- 


on mean, 277 
on sampling variance of mean, 284 
on variance, 277 
standard deviation, 286-288 
random, 275 
sampling, meaning, 113-115 
systematic, 276 
Estimate, biased, 122-123 
error, 106-107 
interval, 121 
meaning, 8, 112 
point, 121 
unbiased, 122-123 
of variance, 123, 137 
Eta (n) (see Correlation ratio) 
mec significance for fourfold table, 


4 
Expected value, 122-123, 158, 166-167, ~ 
232-233, 249-251, 268 
Experimental sampling, 114 


ч 


ratio, 140-142 
in analysis of variance, 234, 238, 246, 251- 


bias in, 252-253, 261-262 
critical values, table, 310-313 
related to t, 239 
in test, of correlation ratio, 209 
of linearity of regression, 208 
of multiple correlation coefficient, 301 
Factor analysis, 3 
Fechner, Gustav, 2 
Ferguson, George A., 20, 94n., 100n., 1187., 
281-283, 288 
Finney, D. J., 3, 174 
Fisher, R. A. pa 151, 173, 174, 282, 308n., 
n., 315и. 
Fisher's z, transformation, 151-154, 213 
Fishman, Joshua A., 206 
Fitting of line, 99-103 
Fourfold point correlation (see Phi coeffi- 
cient 
Frequency, 19 
comparison (see Chi square) 
observed, 157 
theoretical, 157 
Frequency curve, 77-79 
Frequency distribution, 19-36 
(See also Distribution) 


Frequency gon, 24, 28-29 
Freund, Дея 130 Я 

* 
En 


343 


Friedman, M., 188, 274 

Friedman two-way analysis of variance by 
ranks, 272-274 

Fryer, H. C., 53 

Function, meaning, 9, 77-79 


Galton, Francis, 86 
Geometric mean, 49-50 
Glossary of symbols, 333-335 
Goodness of fit, 161-165 
Gosset, W. S., 125 
Gourlay, Neil, 261, 262 
Graphs, 26-31 
Gronow, D. G. C., 145 
Grouping error, effect, on mean, 58 
on variance, 58-59 
and eee р statistics, 155 
Sheppard’s correction for, 58-59 
Guilford, J. P., 205, 209, 288 
Gulliksen, H., 279, 288 


H test, one-way analysis of variance by T" 
ranks, 

Harmonic mean, 

Histogram, 27-28 

Homogeneity of variance, 138, 143-145, 240 

Homoscedasticity (see Homogeneity of 
variance) — 

Hypothesis, alternative, 135-136 

null, 132-133 
Hypothesis testing (see Significance) 


Independence, test, 165-169 

Inference, statistical, 2, 6, 7, 112 

Information, 10, 12, 

Integers, first N, in nonparametric tests, 
268-27 


in rank correlation, 179-181 
standard deviation, 60 
sum, 16-17 
sum of squares, 60 
Interaction, 247-248 
Interval, equality, 11-12 
estimate, 121 
grouping (see Class interval) 
Invariance, 213 


J-shaped distribution, 32, 33, 59 
Jackson, R. W. B., 20, 27, 94n., 100n., 1187., 
206, 282, 283, 288 
enkins, W. L., 206 
— Palmer O., 27, 129, 130, 161, 162n., 


Keeping, E. S., 26n. 129, 261 
Kempt orne, Ò., 249, 
Кеп а, М. С., 182, 183, 188, 191, 192, 
17n. 
Kendall's coefficient, of concordance, 186- 
1 


89 
of consistence, 189-193 


344 


Kendall's tau, 183-186 
significance, table for testing, 317 

Kenney, John F., 26n., 129, 261 

Kruskal, W. H., 271 

Kruskal-Wallis one-way analysis of variance 

by ranks, 270-272, 274 
Kuder, G. F., 280 
ag Pit formulas, 280-282, 287- 
9 


8 
Kurtosis, 33, 35, 52, 64-65, 74 


Lacey, John I., 224 

Large sample statistics, 126 
Least-squares method, 100-103, 124, 297 
Leptokurtic distribution, 32, 33 

Lev, Joseph, 262, 285 

Lewis, D., 127 

Lindquist, E. F., 240 

Lindzey, Gardner, 53 


E lear regression, 99-106 
rithmic transformation, 212 
d, Frederic M., 279, 286, 287, 289 


Macmeeken, A. Mt. 
McNemar, Quinn, 149, 162, 163n., 164. 
Mann, H. B., 269 Р 
Mann-Whitney U test, 268-269, 274 
Mean, arithmetic, 37-45 
of combined groups, 43 
defined, 37 
formulas for, 37, 38, 42 
properties, 44-45 
related to median and mode, 48-49 
sapling distribution, 115-120 
weighted, 38 
assumed, 40 
еотеќгіс, 49-50 
armonic, 50 
Mean deviation, 53-54 
Mean square (see Analysis of variance, vari- 
ance estimate) 
Median, 45-49 
confidence interval for, 129 
Median test, 265-266 
Mendel, Abbé, 161 
Mesokurtic distribution, 33 
Mode, 47—49 
Models in analysis of variance, 248-252 
Moments, 63-64 
Müller, G. E., 3 
Multiple correlation, 292-302 
Aitken’s numerical solution, 298-301 
coefficient, 295, 297, 301 
Doolittle method, 298 
geometry of multiple regression, 296-297 
interpretation, E 
with more than three variables, 297-301 
regression equations, 294-297, 300-301 
sampling error, 301 
with three variables, 292-297 
Multiple regression, 292-302 


Statistical Analysis in Psychology and Education 


Nair, K. R., 129 
Natural origin, 11 
Nonlinear regression, 109-110, 206-208 
Nonlinearity test, 208 
Nonparametric tests, 146, 213, 264-274 
Mann-Whitney U, 268-269 
rank, 268-274 
for Ё correlated samples, 272-274 
for k independent samples 270-272 
for two correlated samples, 269-270 
for two independent samples, 268-269 
sign, for & independent samples, 267 
for two correlated samples, 266-267 
for two independent samples, 265-266 
significance, 204-274 
Normal distribution curve, 77-85 
as approximation to binomial, 83-84 
area under, 80-82 
formula for, 79 
goodness of fit to, 162-164 
ordinates, 79-80 
standard score form, 79 
summary of properties, 84 
table of ordinates and areas, 306-307 
transformation to, 220-223 
Norms, 212 
Null hypothesis, in analysis of variance, 


meaning, 132-133 


Olds, E. G., 316n. 
One- and two-tailed tests, 135-136 
Ordinates of normal curve, 79-80 
Origin, arbitrary, 39-42 

change, 39. 

natural, 11 


Paired-comparisons method, 189 
Parallel tests, 279 
Parameter, 8, 112, 275 
Partial correlation, 290-291 
Pascal's triangle, 73 
Pearson, Karl, 87 
Percentiles, 214-220 
Permutations, 69-70 
Phi coefficient, 194, 196-199 
effect of marginal totals on, 198-199 
related to chi square, 197, 199 
standard error, 199 
Pivotal condensation, 298-301 
Platykurtic distribution, 32, 33 
Point biserial correlation, 194, 199-202 
Point estimate, 121 
mp defined, 4, 112 
inite, 5-6, 115-117 
infinite, 5-6, 118-120 
numerical properties, 5 
Prediction, errors, 106-107 
meaning, 86 
in relation to correlation, 99-111 


Indez 345 


Probability, addition theorem, 68-69 
as area, 78-79 
and binomial, 71-73 
curve, 79 
exact, 173-174 
as level of significance, 75, 133 
multiplication theorem, 69 
nature, 67-68 
Probits method, 3 
Product-moment correlation, 86-111 
assumptions underlying, 109-110 
computation, 91-92, 94-95 
critical values, table, 315 
deviation, 89-90 
direction, 88-90 
related to regression, 105-106 
sampling error, 158-161 
variance interpretation, 107-109 
Psychophysics, 


Random, meaning, 112 
Randomly ри tests, 286-287 
Range, 52-53 
Rank correlation, 179-193 
Kendall's tau, 183-185 
significance, 185-186 
with tied ranks, 185 
Spearman's rho, 179-182 
significance, 182-183 
with tied ranks, 181-182 
Rank tests of significance, 268-274 
Rectangular distribution, 32, 33, 59 
Regression, equation, 103, 106, 294-297, 
300-301 


linear, 99-106 
linearity test, 208 
meaning, 
multiple, 292-302 
nonlinear, 109-110, 206-208 
related to correlation, 105-106 
transformations, 223-224 
Reliability (see Error, of measurement; Re- 
liability coefficient) 
Reliability coefficient, and attenuation, 284 
defined, 278 
for difference scores, 285-286 
effect of test length on, 283 
in experimental psychology, 288 
maximum likelihood estimates, 282-283 
methods of determining, 278-282 
Richardson, M. W., 280 


Sample, meaning, 6-8, 112 
Sampling, random, 112 


unit, 242 
Sampling distribution, of chi square, 158- 
161 
of correlation coefficient, 150-153 
biserial, 204 
tetrachoric, 206 


of differences, 133-135 
experimental, 114 


Sampling distribution, of F, 140-141 


‘of mean, from finite population, 115-117 
from арау large population, 118- 


meaning, 113-115 

of proportion, 127-128 

of score or measurement, 286-287 
of 1, 125-127 

theoretical, 114 


Sampling error, meaning, 113-115 


of ат correlation, 158- 


Sampling theory, 112-136 
Scatter diagram, 88 
Sheppard's correction, 58-59 
Siegel, Sidney, 174, 274 

Sign test, 265-267 
Significance, levels, 133 


meaning, 131-133 
nonparametric tests, 264-274 
rank tests, 268-274 


Sigane test, for biserial correlation, 


for coefficient, Em med 188-189 
consistence, 192-193 
correlation, 152-153 

for correlation ratio, 209 

of difference, correlated proportions, 148- 


correlated variances, 142-143 
correlation coefficients, 153-155 
independent proportions, 146-148, 
169-171 
for means, of correlated samples, 138- 
139 


of independent samples, 136-137, 
227-240 


under nonnormality, 145-146 
where variances are unequal, 143- 


for variances of independent samples, 
140-141 


exact, for fourfold table, 173-174 

of interaction, 249-252 

of Kendall’s tau, 185 

for multiple correlation coefficient, 301 
for nonlinearity, 208 

one- and two-tailed, 135-136 

for phi coefficient, 199 

for point biserial correlation, 202 
rank, 268-274 

for Spearman’s rho, 182-183 

for tetrachoric correlation coefficient, 206 


Skewness, 31-35, 64-65 

Slope of line, 101-103, 297 
Snedecor, G. W., 261, 311m. 
Sorenson, H., 319-331». 
Spearman's rank coefficient, 179-183 


critical values, table, 316 


Spearman-Brown formula, 280, 281, 283 
Square root transformation, 212 
Squares and square roots, table, 319-331 
Standard deviation, 54-61 


advantages, 63 


Pm 


346 Slalislical Analysis in Psychology and Education 


Standard deviation, calculation, 55-58 
for combined groups, 61 
confidence interval for, 130 
effects of grouping on, 58-59 
of measurement error, 286-288 
of first V integers, 60 
standard error, 129-130 
as unit of measurement, 62 
Standard error, 114-115 
of biserial correlation coefficient, 204 
of correlation coefficient, 151 
of difference, 133-135 
for correlated proportions, 149 
for independent proportions, 146-148 


fi or Жул of independent samples, 143- 


for s,s (transformed r), 154 

of estimate, 106-107 

of Kendall’s tau, 185-186 

of mean, effect of grouping on, 155 
from finite population, 117 


ir indefinitely large population, 119— 


meaning, 114-115 - 
of measurement, 286-288 
s median, AS 
of percentage, 
of phi coefficient, 199 
of proportion, 127-129 
of standard deviation, 129-130 
of T in signed-rank test, 270 
of tetrachoric correlation coefficient, 206 
of z, (transformed r), 152 
Standard score, and correlation, 89-90 
defined, 61 
transformation, 213-214 
Standardization of tests, 212 
Stanine scale, 223 
Statistical inference, 2, 6, 7, 112 
Statistics, descriptive, 7 
large sample, 126 
sampling, and grouping error, 155 
as study of ро 4-6 
Stevens, $, S., 10, 12 
ои 125 
Sum, of integers, 16-17 
_ of squares (see Analysis of variance) 
Summation notation explained, 15-16 
Systematic error, 276 


t distribution, 125-127 
1 ratio, 125-127 

assumptions underlying, 264 

in —— of means following F test 


and confidence limits, 127 

for correlation, 152-153, 155 

critical values, table, 308 

for difference, of correlated variances, 


142-143 
of ж for correlated samples, 138- 
for independent 137 


unequal variances, 143-145 


t ratio, for partial correlation, 291 
for point biserial correlation, 202 
related to F, 239 
for Spearman's rho, 183 

T-score transformation, 222 

Tabular representation, rules, 26 

Tetrachoric correlation, 194, 204-206 

Thomson, С. H., 298, 2995. 

Thurstone, L. L., 27n. 

Tied Pm in coefficient of concordance, 

18 


in Kendall's tau, 185 
in Spearman's rho, 181-182 
Ties in = test, for k independent samples, 
27 


for two independent samples, 269 
Torgerson, Warren S., 10 
Transformation, 210-226 

with age allowances, 225-226 

Fisher’s z,, 151-152, 213 

logarithmic, 212 

nature, 210-213 

to normal distribution, 220-223 

to percentile ranks, 214-220 

of r to z,, 151-152 

rank, 213 

regression, 223-224 

square root, 212 

to standard scores, 213-214 

to stanines, 223 

to T scores, 222 
True scores, and correlation, 284-285 

defined, 275-276 

variance, 277-278, 284 
Tsao, Fei, 259 
Tukey, J. W., 238 
Two-tailed test, 135-136 


U-shaped distribution, 32, 33, 59 
U test, Mann-Whitney, 268-269, 274 
Unbiased estimate, 122-123 
of variance, 123, 127 
Unit, change, 39. 
of measurement, 14, 22 
Univariate distribution, 95 
Urban, Е. M., 3 


Value, ted, 122-123, 158, 166-167, 
232-233, 249-251, 208 
Variable, computation, 41 
continuous, 10, 14, 22 
дим, 8 
lependent, 9, 77 
discrete, 10, 14, 22 
independent, 9, 77 
interval, 10-12 
nominal, 10-12 
ordinal, 10-12 
qualitative, 12 
quantitative, 12 
ratio, 10-12 
types, 8-13 


Variance, additive nature, 97 
analysis (see Analysis of variance) 
defined, 55 
of differences, 97 
effect of measurement error on, 277 


estimate (see Analysis of variance, vari- 


ance estimate) 
homogeneity, 135, 143-145, 240 
sampling (see Standard error) 
of sums, 96-97 
of true scores, 277-278, 284 
unbiased estimate, 123, 137 


Walker, Helen M., 262, 285 
Wallis, W.A 3271 
Weber, E. H., 2 

Welch, B. L., 143-145 
Wert, Т E; 306n. 


Index 347 


Whitney, D. R., 269 
Wilcoxon, F., 269, 318». 
Wilcoxon matched-pairs signed-ranks test, 
269-270, 274 
critical values, table, 318 
Wilk, M. B., 249, 262 
Woo, T. L., 165 


Yates, F., 308n., 309n., 315n. 
ums А correction for continuity, 171-172, 


2 score (see Standard score) 

Z, transformation, 151-152, 213 
table transforming r to zr, 314 

Zero, absolute, 11 


-p> 


3 ` р”. 
ү, n y j 
а 8: MS) 
> ۲ 
" P] 
^ Е ui 
* 
n P E 
амн ot ape e 
PUE. VEU айке: жы. дз * 
AS « 
E y ў , 
ў MAL. 
» (i mk 
4 ^ 
айыы» №} » 
D 
| r 
* 
i 


Form No. 3. ; 
мчч PSY, RES.L-1 


Bureau ot Educational & Psychological’ 
Research Library. 


: The book is to be returned within 
the date stamped last. > 


I£.8.Go 


LLLI 


--OFEB1961 


DOLAR РУСИТЕ КОКК КТК 
: К 


>. d itn 


-2 1FE0,498] 


geovevenmeoeceotegos]eeceveaeevecclsecccessccetoscccecccc 


Е: aaa 1961 


з, Pees eee ed (eee) ee) Pree ee eer ee 


Б ee eee ee eee eee eee ey 


theese es 


AEP d= ът 7 


TEAR 


WBGP-595/60.5119C-5M 


5 


EN HI 
E 
5 EEG 


iN 
1 
j 


NER 
бл Шш =й. жү 
i e 
ИТҮ teet 
Vae ant iU 
STA 8 
M ^ ity 


it чу ШК 


n 


ee 


Pe ere oo 


аме ee ee ел. 


